Saturday, September 10, 2016

The Cost of Pragmatism

Sorry for the delay in posting. As promised, I moved towards a bit more pragmatic focus with the language. Towards that goal, I made a tiny program:

a random number => int {
  new .NET System.Random.Next;
}

a random number less than (exclusive maximum : int) => int {
  new .NET System.Random.Next exclusive maximum;
}

a random number between (inclusive minimum : int) and (exclusive maximum : int) => int {
  new .NET System.Random.Next inclusive minimum exclusive maximum;
}

get a number from console => int {
  .NET System.Console.Write "Enter a number between 0 and 100: ";
  : input : string := .NET System.Console.ReadLine;
  int.Parse input;
}

provide user feedback based on (guess : int) and (target : int) => void {
  if guess = target 
    .NET System.Console.WriteLine "Correct!"
  else
    if guess < target
      .NET System.Console.Write "Sorry, your guess was too low. "
    else 
      .NET System.Console.Write "Sorry, your guess was too high. "
}

entrypoint => void {
  : target : int := a random number less than 100;
  : guess : int := 101;

  while not guess = target {
    guess = get a number from console;
    provide user feedback based on guess and target;
  }
}

A tiny console guessing game. Yes, that's what I've been working on for 2 months... At the end of the last post I hinted at performance problems with simple code with .NET imports. Turns out, that was a pretty big deal. The program above took more than 14 hours (I got tired of waiting, and killed it) to compile 2 months ago. 

The problem was in the core of the language. The way Tangent works is to parse all of the declarations, and then use them to infer meaning. That inference engine was pretty straightforward. Assuming a list of tokens coming in, a target type, and a set of rules:

  if tokens.count == 1 && tokens[0].effective_type == target_type
    return tokens;
 
  for index = 0; index < tokens.count; ++index
    foreach rule in ruleset

      if rule matches tokens at index
        replace tokens with rule result, and recurse

There's some extra-cruft in there for handling ambiguity, but the core of it is a shift-reduce parser in spirit. And that is great when you have a few dozen rules and 5-10 inputs. Now with tens of thousands of rules and larger inputs, big-O reminds you who is boss.

What I've done so far is to remove the linear search from the ruleset. It now is a bit smarter about limiting the legal rules based on the first token it sees, and then the second, and so on. I've also removed sum-types from the language. As soon as things could be A or B, it started exploding the search combinations again due to how the language is setup. Since I only really wanted them for nullable types, and I realized I couldn't really use them for nullable types (since Optional<A> and Optional<B> aren't related, even if A and B are) they got dropped. Pragmatism first.

Problem 2 was with my handling of closures. If you remember, while loops are not part of the language - they're defined in code as:

  while (condition: ~>bool) (body: ~>void) => void {
      if condition { body; while condition body; };
  }

Where ~> is the type operator to make things lazily evaluated. This means that not guess = target and the braces are their own separate delegates. And when I made them, I was closing around the locals twice, copying them in each time. So when I got the input from the console, it went into a different closure than the while conditional. So I got to take the slapdash closure code I hacked up ages ago and make it work properly. Oh, and I needed to figure out what "properly" was, since there aren't exactly a whole lot of good tutorials on the mechanics of implementing closures.

Problem 3 was smaller thankfully. Since numbers can't be part of identifiers in Tangent, I couldn't call Int32.Parse. Oops. I just hacked that in manually for now.

Problem 4 was a standard sort of bug. When running the game, it would tell you when you guessed the number, but wouldn't exit. I figured it was another closure issue, or perhaps something with tail calls in delegates. No, I had just implemented not as the CIL opcode not, which is bitwise not, which happily worked on (bool)1, yielding true again.

Next is working on interface interop, performance improvements, or some other pragmatic step forward.

Wednesday, July 20, 2016

Basic Interop

(For those wondering, I did get delegate fields working. They were not too much trouble. You can read about them in the release notes.)

As promised, I'm taking a bit of a shift towards pragmatism in my work, meaning some efforts to do interop with existing .NET code. Doing that comes with some design decisions and coding challenges. Let's go through some of them, since they'll impact most anyone writing a compiler:


How can I do .NET interop without making it neigh impossible to target other languages?


Well, the back end of the compiler (the bits that take structured intermediate forms and generate CIL from them) already are highly coupled with .NET. That code can know all it wants to about .NET. The front end though (the bits that take text and turn it into structured intermediate forms) doesn't know much about .NET. Even the built in types are just placeholders that the back end uses to tie to real ints and strings and the such.

But because of how Tangent works, the parser needs to see all of the legal phrases so it can infer meaning in the code. That means I need something from .NET interop land to generate the phrases for the Tangent code to refer to. And that something needs to carry enough information so that the back end can turn the phrase back into .NET interaction.

In the end, I erred towards YAGNI. I mean, I'll be lucky if I get a dozen people to read this blog, let alone write code in the language. There's no need to support shifting target languages. But the engineer in me can't just couple that code meaninglessly, so it ended up as a mostly separate set of pieces with some interop agnostic interfaces to tie it into the front end. The code that imports .NET code into the front end is just sitting in the front end for ease of implementation, but is mostly separate.

How can I deal with reference types when my language doesn't have null?


I'd really like it to not have the nulls ever, but .NET interop is going to require it somehow. I could make all reference types Optional<T> sort of things, which makes the null very explicit, but the interop code verbose and unweildy. I could make the .NET types nullable, but not Tangent types which kinda works, but will carry over the problems of null and add the leaky abstraction that Tangent types behave differently from interop types. I could just not allow Tangent code to assign or check null, using exceptions to handle nullref... but that is terrible.

In the current implementation, I mostly went the optional route. Constructors return real types, this-params take real types, but all parameters and return values are Optional<T>, requiring explicit null handling in Tangent. It might suck.

How can I deal with subtyping when my language doesn't have it?


Tangent only has interface implementation sorts of subtyping. The compiler doesn't support it, and since the inference engine needs to know that sort of stuff to know what a "legal" interpretation of a phrase is, the information needs to get there somehow. I could add subtyping, but adding something to the language just for interop seems dirty. I could not do subtyping and force explicit casts, but that'd be obnoxious. The tentative plan is to implement conversion functions so that the language can shift types upwards, and have the implementation be some sort of no-op that the back end compiler can deal with. That still will require some sort of inlining so that I'm not kneecapping performance due to my own stubbornness. (More on that later).

The basics are there now though. I need to get it wired into the command line interface, but in the test code, you can specify a collection of assemblies which the code will import the best it can. Non-generic types are largely there. Field access/assignment, properties, functions, constructors, extension methods - all there for instance and static forms. This test code will now compile and run properly with the right imports:

entrypoint => void {
  print "foo".Length;
  print "foobar".Length;
}

Which is awesome. No need to manually specify bindings to .NET functions any more, and the back end work was dead simple.

The bad news is that the above example takes about 50 seconds to compile and run. The other examples I've posted, including the ones doing weird partial specialization took about 200 milliseconds. The addition of a few thousand phrases for the inference code to wade through will do that. It's not terribly surprising, but disappointing none the less.

I guess those optimizations aren't premature anymore...

Sunday, June 19, 2016

Fields

The next step after adding local variables was to extend that concept along to actual fields in classes. Not the sexiest feature in the world, but as the language pushes towards doing stuff, pragmatism will win. Fields (hopefully) look and act like you'd expect - they look like local variables without the preceding colon, and initialize when the class is created. Let's look at the test code:

foo :> new Foo(x: int) {
  (this).test : int := 42 + x;
  (this) => int { this.test } 
}

entrypoint => void {
  print new Foo 5;     // 47
  :x:foo := new Foo 5;
  print x;             // still 47
  x.test = 5;
  print x;             // 5
} 

The only weird part is that (this) bit in there. Since fields can be phrases like everything else in the language, you need to specify where the owner of the field is fetched from. The initializers can pull from the constructor parameters, and from other fields - though there is a bug currently where the compiler may not order the dependencies of initializers properly.

Slightly weird, but nothing ground breaking. You have fields, you can read from them and you can write to them. Now to something a little more weird. I also added support for fields to interfaces. They don't have initializers, but otherwise use the same syntax. Let's look at an example:

Pirate :> pirate (x: string) :< has name {
  (this).name : string := x;
}

Doctor :> doctor (x: string) :< has name {
  (this).name : string := x;
}

has name :> interface {
  (this).name : string;
}

print name of (x: has name) => void {
  print x.name;
}

entrypoint => void {
  print name of doctor "?";
  print name of pirate "Bob"; 

}

Here the interface allows unified access to the fields of two otherwise distinct types. Under the hood, this works because while I've been using "field" to refer to fields, they are implemented similarly to properties. Each field has a getter and a setter function, so the interface simply requires they be implemented. In the example above, we're only using the getter, so could be changed to this and still work:

Pirate :> pirate (x: string) :< has name {
  (this).name : string := x;
}

Doctor :> doctor (x: string) :< has name {
  (this).name : string := x;
}

has name :> interface {
  (this).name => string;
}

print name of (x: has name) => void {
  print x.name;
}

entrypoint => void {
  print name of doctor "?";
  print name of pirate "Bob"; 
}

Next on the list is delegates. Once they're in, the language should be mostly done I think. It will have all of the big language things you need to do stuff, along with a pile of bugs and none of the library/interop things necessary for you to do interesting stuff. Hopefully, delegates will go quickly and I'll figure out some project that will push the pragmatism agenda.

Saturday, June 4, 2016

Basic Local Variables

After this morning's work, Tangent now supports the most basic of local variables. Yes, the language had user defined syntax, delegates, generics, type-class style interfaces, and... no variable support. But now as I move my focus more from building cool stuff to using cool stuff to build other cool stuff, I need to fill in the gaps. That means fixing the inevitable bugs I find, and basic usability stuff like local variables, which are still going to cause trouble since there wasn't any mutable state at all in the language before today.

I wanted local variables to be able to be phrases, just like parameters and types and everything else in the language. Yet I needed them to be clear both to the compiler and the user that they were declarations and not statements, and I didn't have a lot of tokens to work with, since I want most of them to be available for users to build their own phrases with. Let's look at the test code:

entrypoint => void {
  :x:int := 42;
  print x;
  x = x + 1;
  print x;
}

So local variable declarations start with a colon, and then work like (I hope) you'd expect them to - you declare a phrase and its type. Right now, locals require initialization, which uses pascal-style assignment to differentiate it from `=`, which helps the parser (and user) avoid ambiguity. This code works as of today.

What does not work is capturing locals in lambdas. That will just throw you an exception right off. Nested scopes probably don't work. Variable hiding probably works, but is untested. And due to implementation details, this code probably compiles, but will do unknown and terrible things:

entrypoint => void {
  print x;
  :x:int := 42;
}


Because while the code is smart enough to initialize the locals at the right time, it isn't smart enough to only make them available to the scope when declared, mostly because the scope is immutable and I didn't want to rabbit hole it. 

I have not yet decided if I hate this enough to fix it.

In other usability news, I added support to the command line compiler and test suite to support multiple input files. This should let me reuse code for different programs. I expect that I'm not going to want to reimplement conditionals and loops for every program I write...

Sunday, May 29, 2016

Standalone Interface Bindings

Hello again. In Interfaces - Part 4, I talked through the basic interface support which looks mostly like C# and other popular languages, with a little more functionality. You can declare an interface, and when you declare a class, you can say that it implements that interface. Now for some things that are not like what is available in popular languages.

One of the big things about Haskell type classes is that they don't work like that. They don't require you to include the interface reference when declaring your type. Let's look back at the example from Part 4:

convertible to string :> interface {
  (this) => string;
}

Cow :> cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;
}

The second declaration is our type declaration for Cow, along with our inline interface binding. With today's check-in you can do standalone interface binding as well:

convertible to string :> interface {
  (this) => string;
}

Cow :> cow { }

Cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;
}

Here, Cow is tied to convertible to string in its own standalone interface binding declaration, along with the implementation for that interface. This looks and acts a lot more like Haskell type classes. With this, you can take classes defined in some other library and adapt them to your interfaces (or vice versa).

Saturday, May 21, 2016

Interfaces - Part 4 - Compilation

(Part 1Part 2, Part 3)

The last part of implementing basic interfaces is the back end compiler. This part of the code takes the intermediate representation we made in Part 3 and turns it into CIL that actually runs.

There's two parts to this, creating the actual .NET type that represents our interface, and making the functions that properly dispatch to the right implementation. What is interesting about implementing interfaces like type-classes is that the two parts work like things the language already supports, but the two parts resemble different things.

The .NET type compilation used the sum-type compilation, similar to the intermediate code from part 3. All I needed was a few tweaks to allow the "what are all of the types this could be?" part of the code to get that from the sum-type directly or from the interface mapping. Oh, and allowing sum-types to have only one implementation in the runtime (since interfaces may have only one implementation, even if sum-types can't).

The function compilation on the other hand used the dispatch logic for generics. If you remember, interface functions infer their correct type based on the location of the (this) parameter. But since the type coming in is guaranteed to be a sum-type, the generic dispatch needs to check the type of the stored variant, not the sum-type itself.

Wrap that all together, and my basic test code now compiles and runs properly:

convertible to string :> interface {
  (this) to string => string;
}

Cow :> cow :< convertible to string {
  (this) to string => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x to string;
}

entrypoint => void {
  cow speak;
}

Hopefully this is straightforward - the code first declares an interface, which requires a single function. It then declares a class Cow with nullary constructor cow and implements the interface. And then it declares a function which works with that interface to print something to the screen so we can tell if it works. Finally, the entry point ties it all together by invoking that function with a concrete instance of Cow. Super dumb test code.

"So what?" you say, "Any plain interface implementation can do that."

That is entirely true. So let's covert this over to something that .NET style interfaces can't do. All the interface needs is some way to convert the object to a string. So why not just declare that?

convertible to string :> interface {
  (this) => string;
}

Cow :> cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;

}

Yeah, that works too. Here our convertible to string interface requires its implementors to supply an implicit conversion to string. Since print requires a string as its input, the language is smart enough to infer that you want the conversion to happen there.

Next on the list is non-inline interface binding, fancier examples, and .NET interop. I'm not sure which will come first - we'll see where whimsy takes me.

Friday, May 6, 2016

Interfaces - Part 3 - Intermediate Steps

(Part 1, Part 2)

The second step to getting interfaces working is to actually do something with the stuff we parsed in Part 2. The grammar just spots interface declarations from the tokens it sees as input. The intermediate code is what takes those declarations and resolves them into real types. It's also where the function implementations are turned into a syntax tree, which is extra challenging since Tangent has no fixed order of operations.

The first order of business was to have a class to keep track of interface declarations. Since under the covers, all type declarations (enums, sum types, product types, type aliases, even built in types) are a phrase and a Tangent Type, I went with a new Tangent Type to represent the interface declarations themselves. The object has references to the functions required to satisfy it and... well, that's it. For the moment, that's all interfaces are.

If you remember from part 2, those functions are actually generic, and (this) needs to be replaced by a generic inference. That was the next bit of work to be done. Easy enough, a little bit of code between the grammar and intermediate objects to do a replacement.

Now I have interfaces, and the functions they need. To allow the code to know what types should implement what interfaces, I needed to add something to the type resolver to make those relations. Since interfaces are just a different sort of type declaration, I already had code to resolve type declarations, making that relatively straight forward. Now I have all of my interfaces in my list of type declarations, and all of the relations between them and concrete types.

But how to let the order of operation inference code know about the relation? It doesn't know about subtyping, and based on my previous iterations, I know that really complicates that code if it shows up. And these interfaces aren't really subtyping anyways - they work like type classes. Type classes in turn work like sum types: you have some symbol that means "this could be A or B or C or...". So interfaces got implemented similarly to sum types in this regard. The compiler adds an implicit conversion function that goes from the implementation to the interface. The order of operation inference code then treats that like any other function, allowing me to do no changes there. Awesome.

After that thing runs, I get a syntax tree out with functions that call the interfaces' functions if appropriate to go with the type declaration information. The compiler is ready to pass that to the back end which will turn it into executable CIL. More on that next time.