Sunday, December 11, 2016

Generics and you.

Sorry for the delay in writing. I've taken to writing a bit in the various Slacks I'm in, which lend themselves a bit better to the day to day work to make things function. But they're not public, so here we are.

Speaking of which, the slack denizens complained about lack of documentation, so now the github readme is updated to cover the basics of the language. It's mostly simple snippets to help define syntax and provide a gentle-ish introduction to phrases and the idioms of the language. And of course, writing them up gave me ample opportunity to find bugs.

The focus of bugs over the last 3 months has largely been around generics. There are three major places where these bugs came up: in my type system, doing codegen to .NET, and how they interoperate with closures.

In Tangent, generics are represented by a ParameterDeclaration. It's a simple class that represents the name : type sort of declarations used throughout the language. The type for generics is the constraint - and for now, almost always any. Parameter declarations though aren't types. In the type system, to refer to a generic you would use one of two types - a GenericArgumentReferenceType or a GenericInferencePlaceholder. These are meant to line up with the .NET sort of implementation where generic parameters in function parameters can infer the type based on their arguments, and generic parameters in classes or as return values just reference some explicit specification. The core parameter declaration in a type declaration is that explicit specification.

And that all fell apart when doing the actual implementation. Consider a simple class declaration.

pair (T) :> (x : T), (y : T) {   
  (this).x : T := x;       
  (this).y : T := y;       
}

T on the left side is the parameter declaration, and all of the other Ts are generic references, yes? Not so much. That compiles into C# code that looks something like this:

public class pair<T> {
  public T x;
  public T y;
}

public static pair<T> ctor<T>(T x, T y) {
  pair<T> result = new pair<T>();
  result.x = x;
  result.y = y;
  return result;
}

public static T getx<T>(pair<T> owner) {
  return owner.x;
}

public static void setx<T>(pair<T> owner, T value) {
  owner.x = value;
}

public static T gety<T>(pair<T> owner) {
  return owner.y;
}

public static void sety<T>(pair<T> owner, T value) {
  owner.y = value;
}

So while the T for the fields is an actual reference to the class' generic, the rest aren't. They get their generic info from the method they're tied to. And of course - since everything is named T there was ample opportunity for me to get them mixed up. 

That is all terrible, and it gets worse. What happens when you toss closures into the picture?

Let's take a look at an implementation of the ternary operator (x?a:b) in Tangent:

(conditional : bool) ? (a : ~>(T)) | (b : ~>T) => T {
  : result : ~> T := b;
  if conditional result = a;
  result;
}

A little weird, but nothing outlandish. We have a phrase that takes a bool, a question mark, something that yields T (think Func<T> in C#), a pipe, and something that yields T again. That phrase will return whatever T is. The gotcha here is line 2 of the function. Because if statements are defined in the language, result = a is inferred to mean () => { result = a; } meaning I need to close over result and a to make that work. For those of you that are lucky enough to not know how closures are implemented, it compiles into something like this:

public class Closure0<T> {
  public bool conditional;
  public Func<T> a;
  public Func<T> b;
  public Func<T> result;

  public void Implementation0() {
    result = a;
  }
}

public static T ternary(bool conditional, Func<T> a, Func<T> b) {
  Closure0<T> scope = new Closure0<T>();
  scope.conditional = conditional;
  scope.a = a;
  scope.b = b;
  
  scope,result = b;
  if (scope.conditional, scope.Implementation0);
  return scope.result;
} 

Which is gross on the best of days. But you'll also note that now we've got class level generics again. And again some Ts are function level generics and some Ts are class level generics. Worse yet, they change depending on if you're calling the closure variable from the closure scope or from the function scope. Bug city.

The good news is that the simple examples above now work. Also, I've gotten enough interop working that I can actually make .NET Lists and add things to them. That required a small change to the language to support explicit generic arguments in functions. That allows stupid code like this:

foo (T) => void {
  print "foo"
}

entrypoint => void {
  foo int

}

But also allows new List<int> to be a phrase that the language can interpret. Next on the docket is interfaces. The interface code I tried to do for the readme doesn't work, I can't use IEnumerable, and I certainly can't make Tangent types work with .NET interfaces or .NET types with Tangent interfaces. 

Saturday, September 10, 2016

The Cost of Pragmatism

Sorry for the delay in posting. As promised, I moved towards a bit more pragmatic focus with the language. Towards that goal, I made a tiny program:

a random number => int {
  new .NET System.Random.Next;
}

a random number less than (exclusive maximum : int) => int {
  new .NET System.Random.Next exclusive maximum;
}

a random number between (inclusive minimum : int) and (exclusive maximum : int) => int {
  new .NET System.Random.Next inclusive minimum exclusive maximum;
}

get a number from console => int {
  .NET System.Console.Write "Enter a number between 0 and 100: ";
  : input : string := .NET System.Console.ReadLine;
  int.Parse input;
}

provide user feedback based on (guess : int) and (target : int) => void {
  if guess = target 
    .NET System.Console.WriteLine "Correct!"
  else
    if guess < target
      .NET System.Console.Write "Sorry, your guess was too low. "
    else 
      .NET System.Console.Write "Sorry, your guess was too high. "
}

entrypoint => void {
  : target : int := a random number less than 100;
  : guess : int := 101;

  while not guess = target {
    guess = get a number from console;
    provide user feedback based on guess and target;
  }
}

A tiny console guessing game. Yes, that's what I've been working on for 2 months... At the end of the last post I hinted at performance problems with simple code with .NET imports. Turns out, that was a pretty big deal. The program above took more than 14 hours (I got tired of waiting, and killed it) to compile 2 months ago. 

The problem was in the core of the language. The way Tangent works is to parse all of the declarations, and then use them to infer meaning. That inference engine was pretty straightforward. Assuming a list of tokens coming in, a target type, and a set of rules:

  if tokens.count == 1 && tokens[0].effective_type == target_type
    return tokens;
 
  for index = 0; index < tokens.count; ++index
    foreach rule in ruleset

      if rule matches tokens at index
        replace tokens with rule result, and recurse

There's some extra-cruft in there for handling ambiguity, but the core of it is a shift-reduce parser in spirit. And that is great when you have a few dozen rules and 5-10 inputs. Now with tens of thousands of rules and larger inputs, big-O reminds you who is boss.

What I've done so far is to remove the linear search from the ruleset. It now is a bit smarter about limiting the legal rules based on the first token it sees, and then the second, and so on. I've also removed sum-types from the language. As soon as things could be A or B, it started exploding the search combinations again due to how the language is setup. Since I only really wanted them for nullable types, and I realized I couldn't really use them for nullable types (since Optional<A> and Optional<B> aren't related, even if A and B are) they got dropped. Pragmatism first.

Problem 2 was with my handling of closures. If you remember, while loops are not part of the language - they're defined in code as:

  while (condition: ~>bool) (body: ~>void) => void {
      if condition { body; while condition body; };
  }

Where ~> is the type operator to make things lazily evaluated. This means that not guess = target and the braces are their own separate delegates. And when I made them, I was closing around the locals twice, copying them in each time. So when I got the input from the console, it went into a different closure than the while conditional. So I got to take the slapdash closure code I hacked up ages ago and make it work properly. Oh, and I needed to figure out what "properly" was, since there aren't exactly a whole lot of good tutorials on the mechanics of implementing closures.

Problem 3 was smaller thankfully. Since numbers can't be part of identifiers in Tangent, I couldn't call Int32.Parse. Oops. I just hacked that in manually for now.

Problem 4 was a standard sort of bug. When running the game, it would tell you when you guessed the number, but wouldn't exit. I figured it was another closure issue, or perhaps something with tail calls in delegates. No, I had just implemented not as the CIL opcode not, which is bitwise not, which happily worked on (bool)1, yielding true again.

Next is working on interface interop, performance improvements, or some other pragmatic step forward.

Wednesday, July 20, 2016

Basic Interop

(For those wondering, I did get delegate fields working. They were not too much trouble. You can read about them in the release notes.)

As promised, I'm taking a bit of a shift towards pragmatism in my work, meaning some efforts to do interop with existing .NET code. Doing that comes with some design decisions and coding challenges. Let's go through some of them, since they'll impact most anyone writing a compiler:


How can I do .NET interop without making it neigh impossible to target other languages?


Well, the back end of the compiler (the bits that take structured intermediate forms and generate CIL from them) already are highly coupled with .NET. That code can know all it wants to about .NET. The front end though (the bits that take text and turn it into structured intermediate forms) doesn't know much about .NET. Even the built in types are just placeholders that the back end uses to tie to real ints and strings and the such.

But because of how Tangent works, the parser needs to see all of the legal phrases so it can infer meaning in the code. That means I need something from .NET interop land to generate the phrases for the Tangent code to refer to. And that something needs to carry enough information so that the back end can turn the phrase back into .NET interaction.

In the end, I erred towards YAGNI. I mean, I'll be lucky if I get a dozen people to read this blog, let alone write code in the language. There's no need to support shifting target languages. But the engineer in me can't just couple that code meaninglessly, so it ended up as a mostly separate set of pieces with some interop agnostic interfaces to tie it into the front end. The code that imports .NET code into the front end is just sitting in the front end for ease of implementation, but is mostly separate.

How can I deal with reference types when my language doesn't have null?


I'd really like it to not have the nulls ever, but .NET interop is going to require it somehow. I could make all reference types Optional<T> sort of things, which makes the null very explicit, but the interop code verbose and unweildy. I could make the .NET types nullable, but not Tangent types which kinda works, but will carry over the problems of null and add the leaky abstraction that Tangent types behave differently from interop types. I could just not allow Tangent code to assign or check null, using exceptions to handle nullref... but that is terrible.

In the current implementation, I mostly went the optional route. Constructors return real types, this-params take real types, but all parameters and return values are Optional<T>, requiring explicit null handling in Tangent. It might suck.

How can I deal with subtyping when my language doesn't have it?


Tangent only has interface implementation sorts of subtyping. The compiler doesn't support it, and since the inference engine needs to know that sort of stuff to know what a "legal" interpretation of a phrase is, the information needs to get there somehow. I could add subtyping, but adding something to the language just for interop seems dirty. I could not do subtyping and force explicit casts, but that'd be obnoxious. The tentative plan is to implement conversion functions so that the language can shift types upwards, and have the implementation be some sort of no-op that the back end compiler can deal with. That still will require some sort of inlining so that I'm not kneecapping performance due to my own stubbornness. (More on that later).

The basics are there now though. I need to get it wired into the command line interface, but in the test code, you can specify a collection of assemblies which the code will import the best it can. Non-generic types are largely there. Field access/assignment, properties, functions, constructors, extension methods - all there for instance and static forms. This test code will now compile and run properly with the right imports:

entrypoint => void {
  print "foo".Length;
  print "foobar".Length;
}

Which is awesome. No need to manually specify bindings to .NET functions any more, and the back end work was dead simple.

The bad news is that the above example takes about 50 seconds to compile and run. The other examples I've posted, including the ones doing weird partial specialization took about 200 milliseconds. The addition of a few thousand phrases for the inference code to wade through will do that. It's not terribly surprising, but disappointing none the less.

I guess those optimizations aren't premature anymore...

Sunday, June 19, 2016

Fields

The next step after adding local variables was to extend that concept along to actual fields in classes. Not the sexiest feature in the world, but as the language pushes towards doing stuff, pragmatism will win. Fields (hopefully) look and act like you'd expect - they look like local variables without the preceding colon, and initialize when the class is created. Let's look at the test code:

foo :> new Foo(x: int) {
  (this).test : int := 42 + x;
  (this) => int { this.test } 
}

entrypoint => void {
  print new Foo 5;     // 47
  :x:foo := new Foo 5;
  print x;             // still 47
  x.test = 5;
  print x;             // 5
} 

The only weird part is that (this) bit in there. Since fields can be phrases like everything else in the language, you need to specify where the owner of the field is fetched from. The initializers can pull from the constructor parameters, and from other fields - though there is a bug currently where the compiler may not order the dependencies of initializers properly.

Slightly weird, but nothing ground breaking. You have fields, you can read from them and you can write to them. Now to something a little more weird. I also added support for fields to interfaces. They don't have initializers, but otherwise use the same syntax. Let's look at an example:

Pirate :> pirate (x: string) :< has name {
  (this).name : string := x;
}

Doctor :> doctor (x: string) :< has name {
  (this).name : string := x;
}

has name :> interface {
  (this).name : string;
}

print name of (x: has name) => void {
  print x.name;
}

entrypoint => void {
  print name of doctor "?";
  print name of pirate "Bob"; 

}

Here the interface allows unified access to the fields of two otherwise distinct types. Under the hood, this works because while I've been using "field" to refer to fields, they are implemented similarly to properties. Each field has a getter and a setter function, so the interface simply requires they be implemented. In the example above, we're only using the getter, so could be changed to this and still work:

Pirate :> pirate (x: string) :< has name {
  (this).name : string := x;
}

Doctor :> doctor (x: string) :< has name {
  (this).name : string := x;
}

has name :> interface {
  (this).name => string;
}

print name of (x: has name) => void {
  print x.name;
}

entrypoint => void {
  print name of doctor "?";
  print name of pirate "Bob"; 
}

Next on the list is delegates. Once they're in, the language should be mostly done I think. It will have all of the big language things you need to do stuff, along with a pile of bugs and none of the library/interop things necessary for you to do interesting stuff. Hopefully, delegates will go quickly and I'll figure out some project that will push the pragmatism agenda.

Saturday, June 4, 2016

Basic Local Variables

After this morning's work, Tangent now supports the most basic of local variables. Yes, the language had user defined syntax, delegates, generics, type-class style interfaces, and... no variable support. But now as I move my focus more from building cool stuff to using cool stuff to build other cool stuff, I need to fill in the gaps. That means fixing the inevitable bugs I find, and basic usability stuff like local variables, which are still going to cause trouble since there wasn't any mutable state at all in the language before today.

I wanted local variables to be able to be phrases, just like parameters and types and everything else in the language. Yet I needed them to be clear both to the compiler and the user that they were declarations and not statements, and I didn't have a lot of tokens to work with, since I want most of them to be available for users to build their own phrases with. Let's look at the test code:

entrypoint => void {
  :x:int := 42;
  print x;
  x = x + 1;
  print x;
}

So local variable declarations start with a colon, and then work like (I hope) you'd expect them to - you declare a phrase and its type. Right now, locals require initialization, which uses pascal-style assignment to differentiate it from `=`, which helps the parser (and user) avoid ambiguity. This code works as of today.

What does not work is capturing locals in lambdas. That will just throw you an exception right off. Nested scopes probably don't work. Variable hiding probably works, but is untested. And due to implementation details, this code probably compiles, but will do unknown and terrible things:

entrypoint => void {
  print x;
  :x:int := 42;
}


Because while the code is smart enough to initialize the locals at the right time, it isn't smart enough to only make them available to the scope when declared, mostly because the scope is immutable and I didn't want to rabbit hole it. 

I have not yet decided if I hate this enough to fix it.

In other usability news, I added support to the command line compiler and test suite to support multiple input files. This should let me reuse code for different programs. I expect that I'm not going to want to reimplement conditionals and loops for every program I write...

Sunday, May 29, 2016

Standalone Interface Bindings

Hello again. In Interfaces - Part 4, I talked through the basic interface support which looks mostly like C# and other popular languages, with a little more functionality. You can declare an interface, and when you declare a class, you can say that it implements that interface. Now for some things that are not like what is available in popular languages.

One of the big things about Haskell type classes is that they don't work like that. They don't require you to include the interface reference when declaring your type. Let's look back at the example from Part 4:

convertible to string :> interface {
  (this) => string;
}

Cow :> cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;
}

The second declaration is our type declaration for Cow, along with our inline interface binding. With today's check-in you can do standalone interface binding as well:

convertible to string :> interface {
  (this) => string;
}

Cow :> cow { }

Cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;
}

Here, Cow is tied to convertible to string in its own standalone interface binding declaration, along with the implementation for that interface. This looks and acts a lot more like Haskell type classes. With this, you can take classes defined in some other library and adapt them to your interfaces (or vice versa).

Saturday, May 21, 2016

Interfaces - Part 4 - Compilation

(Part 1Part 2, Part 3)

The last part of implementing basic interfaces is the back end compiler. This part of the code takes the intermediate representation we made in Part 3 and turns it into CIL that actually runs.

There's two parts to this, creating the actual .NET type that represents our interface, and making the functions that properly dispatch to the right implementation. What is interesting about implementing interfaces like type-classes is that the two parts work like things the language already supports, but the two parts resemble different things.

The .NET type compilation used the sum-type compilation, similar to the intermediate code from part 3. All I needed was a few tweaks to allow the "what are all of the types this could be?" part of the code to get that from the sum-type directly or from the interface mapping. Oh, and allowing sum-types to have only one implementation in the runtime (since interfaces may have only one implementation, even if sum-types can't).

The function compilation on the other hand used the dispatch logic for generics. If you remember, interface functions infer their correct type based on the location of the (this) parameter. But since the type coming in is guaranteed to be a sum-type, the generic dispatch needs to check the type of the stored variant, not the sum-type itself.

Wrap that all together, and my basic test code now compiles and runs properly:

convertible to string :> interface {
  (this) to string => string;
}

Cow :> cow :< convertible to string {
  (this) to string => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x to string;
}

entrypoint => void {
  cow speak;
}

Hopefully this is straightforward - the code first declares an interface, which requires a single function. It then declares a class Cow with nullary constructor cow and implements the interface. And then it declares a function which works with that interface to print something to the screen so we can tell if it works. Finally, the entry point ties it all together by invoking that function with a concrete instance of Cow. Super dumb test code.

"So what?" you say, "Any plain interface implementation can do that."

That is entirely true. So let's covert this over to something that .NET style interfaces can't do. All the interface needs is some way to convert the object to a string. So why not just declare that?

convertible to string :> interface {
  (this) => string;
}

Cow :> cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;

}

Yeah, that works too. Here our convertible to string interface requires its implementors to supply an implicit conversion to string. Since print requires a string as its input, the language is smart enough to infer that you want the conversion to happen there.

Next on the list is non-inline interface binding, fancier examples, and .NET interop. I'm not sure which will come first - we'll see where whimsy takes me.

Friday, May 6, 2016

Interfaces - Part 3 - Intermediate Steps

(Part 1, Part 2)

The second step to getting interfaces working is to actually do something with the stuff we parsed in Part 2. The grammar just spots interface declarations from the tokens it sees as input. The intermediate code is what takes those declarations and resolves them into real types. It's also where the function implementations are turned into a syntax tree, which is extra challenging since Tangent has no fixed order of operations.

The first order of business was to have a class to keep track of interface declarations. Since under the covers, all type declarations (enums, sum types, product types, type aliases, even built in types) are a phrase and a Tangent Type, I went with a new Tangent Type to represent the interface declarations themselves. The object has references to the functions required to satisfy it and... well, that's it. For the moment, that's all interfaces are.

If you remember from part 2, those functions are actually generic, and (this) needs to be replaced by a generic inference. That was the next bit of work to be done. Easy enough, a little bit of code between the grammar and intermediate objects to do a replacement.

Now I have interfaces, and the functions they need. To allow the code to know what types should implement what interfaces, I needed to add something to the type resolver to make those relations. Since interfaces are just a different sort of type declaration, I already had code to resolve type declarations, making that relatively straight forward. Now I have all of my interfaces in my list of type declarations, and all of the relations between them and concrete types.

But how to let the order of operation inference code know about the relation? It doesn't know about subtyping, and based on my previous iterations, I know that really complicates that code if it shows up. And these interfaces aren't really subtyping anyways - they work like type classes. Type classes in turn work like sum types: you have some symbol that means "this could be A or B or C or...". So interfaces got implemented similarly to sum types in this regard. The compiler adds an implicit conversion function that goes from the implementation to the interface. The order of operation inference code then treats that like any other function, allowing me to do no changes there. Awesome.

After that thing runs, I get a syntax tree out with functions that call the interfaces' functions if appropriate to go with the type declaration information. The compiler is ready to pass that to the back end which will turn it into executable CIL. More on that next time.

Sunday, April 10, 2016

Interfaces - Part 2 - Grammar

The first step to getting interfaces working is to tweak the parser to actually recognize the damned things. Well, that's not quite right. The first step is that I need to figure out what I want, and what parts make that up. Let's look at what I want from Part 1. I want them to do:

  • I want interfaces to be some contract that types can supply. "I don't care what type is passed in here, as long as it can be enumerated." 
  • I need interfaces to be able to be generic - I am not writing IntEnumerable, StringEnumerable, BoolEnumerable, and so on. 
  • I want interfaces to be multiply-implemented. Just because something is enumerable should not prevent it from being comparable.
  • I would like interfaces to be open. That is, if you wrote AwesomeParser and I have IParser, I should be able to adapt your implementation to my interface without changing your code.
  • It'd be nice if interfaces could support operators better. .NET interfaces suck at things like T+T => T.
Some of you out there will notice that these requirements are eerily close to what Haskell type classes do. That is no mistake. Haskell's type system is widely regarded as one of the most awesome made by man, and type classes in particular are one of the unique parts that set it apart. Type classes though work a lot more like .NET generic constraints than .NET interfaces. So how to design the language to support these things in a way that is easy/familiar, without breaking the formalisms necessary to keep everything from falling apart?

For the grammar, there are three main things I need to change to support interfaces:


  • I need to be able to declare an interface - "Here is a contract, and here is how you access it."
  • I need to be able to say that a type implements that interface - "Type X implements that contract"
  • I need to be able to use that interface in functions - "This function needs something that is enumerable as its first argument"

To declare an interface in Tangent, you use the standard type declaration syntax similarly to .NET interfaces. Say we wanted an interface to allow classes to provide a human readable view:

    human readable :> interface {
        human readable (this) => string;
    }

This defines an interface called "human readable" that has a single required function, "human readable" followed by a variable of the implementing type, which returns a string. In the grammar, this means defining new rules for the interface declaration and adding them to the type declaration rule.

To declare a class that uses the interface, it is included in the declaration, akin to .NET style inheritance:

    point :> (x: int), (y: int) :< human readable {
       human readable (this) => string {
           x + ", " + y;
       }
   
    } 

The only difference is that you use that funky duck face :< rather than a plain colon. This is done to better visually distinguish the interface implementations from the parameters, and when non-inline interface binding comes into play (we'll get to that in later parts). Again, this is declaring a type named "point", whose constructor is two ints with a comma in-between, and implements the human readable interface. It then goes in and implements the required function.

In the grammar, this means adding the inline interface binding part as optional. The challenge here is that the sum-type extension (x|y|z) is in the same area and also optional. That broke some tests and will likely be a complexity in the parser.

And to use the interface in a function, you can just specify it as the type:

    display (value: human readable) => void {
        print human readable value;
    }

Which works as you would expect from .NET land. You get a function that takes anything that satisfies the interface, and you can use any function guaranteed to be there. In the grammar... that isn't any change at all. But remember when I said that type classes are more like generic constraints than real interfaces? Yeah, here is where that comes into play. When that gets compiled, it will look more like this C# declaration:

    void display<T>(T value) where T: IHumanReadable { ... }

and Tangent does the magic to include the generic parameter that you don't really care about. But what if you do care about it? That's where the grammar change comes into play. If you care about the actual type being passed in, then you can declare parameters like this:

    display (value: T: human readable) => void { ... }

which will bind the actual concrete type to the phrase T so you can work with it. That isn't important here, but does come into play in other scenarios where the type-classiness of Tangent interfaces is important. Consider this interface to do a C-style compare (0 is equals, >0 is greater than, <0 is less than):

    comparable :> interface {
        compare (this) to (this) => int;
    }

Great, now we have a nice interface that says you can take two things and compare them. Let's use this in a trivial function:

    smaller (a: comparable) or (b: comparable) => comparable {
        if compare a to b < 0
          then a
          else b;
    }

That does not work.

What the interface says is that the compare function takes two arguments of the same type. The smallest function takes two comparables, but they might not be the same type. So that's where the three part parameter declaration comes into play:

    smaller (a: T: comparable) or (b: T) => T {       
        if compare a to b < 0

          then a
          else b;   
    }

In this function, we're specifically saying that the second argument (and the return value) must be the same type as the first argument. Since that meets the interface's constraints (and because Tangent doesn't have subtyping), that works. 

So those are the three grammar changes going into the language for basic interface support. Now I just need to figure out how to use them now that the parser can recognize them. Part 3 and more to come!
    
     
   


Monday, March 28, 2016

Interfaces - Part 1

One of the motivations for making Tangent compile into CIL was that it would make it dead simple to interop with the huge pile of .NET code out there in the world. I mean, I would love to rewrite a bunch of library code, but I expect a lot of it would be worse than the stuff that people researched for years to make.

And I'm lazy.

So instead, I'm going to do a whole bunch of work implementing a well known feature in a weird way for a language that nobody will ever use. \o/

The problem is that Tangent does not have subtyping. And .NET code has... subtyping. I mean, I want IEnumerable to work - that doesn't seem to be too much to ask. Tangent though has some challenges with subtyping. For one, subtyping adds a lot of ambiguity to the order of operations inference engine. When an expression could be a few different types, you have more branches in the possible options to take. Another is dispatch. For .NET interfaces and single dispatch, there is a clear path to the implementation. Since Tangent allows multiple dispatch, it'd be weird to have interfaces behave differently.

So, what do I want interfaces to do?


  • I want interfaces to be some contract that types can supply. "I don't care what type is passed in here, as long as it can be enumerated." 
  • I need interfaces to be able to be generic - I am not writing IntEnumerable, StringEnumerable, BoolEnumerable, and so on. 
  • I want interfaces to be multiply-implemented. Just because something is enumerable should not prevent it from being comparable.
  • I would like interfaces to be open. That is, if you wrote AwesomeParser and I have IParser, I should be able to adapt your implementation to my interface without changing your code.
  • It'd be nice if interfaces could support operators better. .NET interfaces suck at things like T+T => T.
Those are the goals. Now to work out the kinks of how I plan on actually doing this stuff. I need to come up with an idea that actually works. It needs to integrate nicely with the existing language. And it needs to get compiled into CIL that is at least a little performant. More to come in Part 2.

Sunday, March 20, 2016

Parser Refactoring

Not a lot of sexy work in the past few months. I wanted to add features - local variables, interfaces, .NET interop, something - but just found myself dreading going into the parser code to extend it. So instead, I spent a little time refactoring that to suck less.

I should note that I'm talking about the actual parser code for type and function declarations, not Tangent's fancy inference engine for function bodies. The parser code used to be just a series of functions. They took tokens, they returned an optional type, popping tokens as needed on success, perhaps calling other parser functions. When the grammar was a dozen or so rules, that was fine. It was easy to step through, and it was easy to unit test.

That quickly got squirrelly as I added generics, function params, product types, and sum types. The grammar is still only about two dozen rules, but they interact a lot more. This means each parsing function is doing more, and that added complexity was harming unit testing and my ability to jump back into the code after some time doing real work.

So, I went back to an old stand-by the compositional parser. One of the first things I did when I learned C# was to create a parsing framework akin to boost::spirit for the first version of Tangent. Not using that though. It generates full parse trees, which are awkward to work with. I just tossed together something similar, but specific to the existing structure in the Tangent code.

What this allows me to do is to define the grammar more declaratively. That makes it easier to see what's going on. And since the declarations get real ugly real fast, it pushes me to separate things out into testable, reusable chunks. And since I was smart enough to have a small set of regression tests, I could make sure that the refactoring didn't break anything I expected to work before.



The stable checkpoint is available at https://github.com/Telastyn/Tangent/releases/tag/Milestone3.1

Wednesday, January 6, 2016

Delegates - Part 2

I talked a little bit last time about how Tangent's lambdas are a little weird compared to C# and other languages. Today, I'm going to go a little more in depth into that, as well as showing some other examples of use. The first is an example that uses the return value of the lambda:

apply (fn(int): int) twice to (x: int) => int {
  fn fn x;
}

entrypoint => void {
  print apply (x)=>{x*x} twice to 2;
}

Because of the order of operation inference, fn fn x gets parsed out to the right order, passing the result of one operation to the next.

Now for some weirder stuff. Because of how Tangent parses statements, a lambda can have different interpretations depending on what type it resolves to. So when the compiler matches a lambda to a function parameter it needs to make sure that the lambda works for those types. A simple example:

foo(fn(int):void) => void {
  fn 42;
}

foo(fn(string):void) => void {
  fn "foo";
}

entrypoint => void {
  foo (x) => { print x + 2; };

}

Since string + int makes no sense, the int -> void overload is chosen and this prints 44. Weirder yet, Tangent includes the return type in the work. If the lambda can't return the requested type, that interpretation is discarded. Consider a similar example:

foo(fn(int): int) => void {
  print "with int";
  print fn 42;
}

foo(fn(int): void) => void {
  print "with void";
  fn 42;
}

entrypoint => void {
  foo (x) => {x;};
  foo (x) => {print x;};

}

Here the different bodies return different types, leading to dispatching to the different overloads and avoiding ambiguity.

Nifty, and it all worked once I wrote up the tests. Next will be maybe cleaning up a little of the lambda syntax and testing more of the closure functionality.