Sunday, December 11, 2016

Generics and you.

Sorry for the delay in writing. I've taken to writing a bit in the various Slacks I'm in, which lend themselves a bit better to the day to day work to make things function. But they're not public, so here we are.

Speaking of which, the slack denizens complained about lack of documentation, so now the github readme is updated to cover the basics of the language. It's mostly simple snippets to help define syntax and provide a gentle-ish introduction to phrases and the idioms of the language. And of course, writing them up gave me ample opportunity to find bugs.

The focus of bugs over the last 3 months has largely been around generics. There are three major places where these bugs came up: in my type system, doing codegen to .NET, and how they interoperate with closures.

In Tangent, generics are represented by a ParameterDeclaration. It's a simple class that represents the name : type sort of declarations used throughout the language. The type for generics is the constraint - and for now, almost always any. Parameter declarations though aren't types. In the type system, to refer to a generic you would use one of two types - a GenericArgumentReferenceType or a GenericInferencePlaceholder. These are meant to line up with the .NET sort of implementation where generic parameters in function parameters can infer the type based on their arguments, and generic parameters in classes or as return values just reference some explicit specification. The core parameter declaration in a type declaration is that explicit specification.

And that all fell apart when doing the actual implementation. Consider a simple class declaration.

pair (T) :> (x : T), (y : T) {   
  (this).x : T := x;       
  (this).y : T := y;       
}

T on the left side is the parameter declaration, and all of the other Ts are generic references, yes? Not so much. That compiles into C# code that looks something like this:

public class pair<T> {
  public T x;
  public T y;
}

public static pair<T> ctor<T>(T x, T y) {
  pair<T> result = new pair<T>();
  result.x = x;
  result.y = y;
  return result;
}

public static T getx<T>(pair<T> owner) {
  return owner.x;
}

public static void setx<T>(pair<T> owner, T value) {
  owner.x = value;
}

public static T gety<T>(pair<T> owner) {
  return owner.y;
}

public static void sety<T>(pair<T> owner, T value) {
  owner.y = value;
}

So while the T for the fields is an actual reference to the class' generic, the rest aren't. They get their generic info from the method they're tied to. And of course - since everything is named T there was ample opportunity for me to get them mixed up. 

That is all terrible, and it gets worse. What happens when you toss closures into the picture?

Let's take a look at an implementation of the ternary operator (x?a:b) in Tangent:

(conditional : bool) ? (a : ~>(T)) | (b : ~>T) => T {
  : result : ~> T := b;
  if conditional result = a;
  result;
}

A little weird, but nothing outlandish. We have a phrase that takes a bool, a question mark, something that yields T (think Func<T> in C#), a pipe, and something that yields T again. That phrase will return whatever T is. The gotcha here is line 2 of the function. Because if statements are defined in the language, result = a is inferred to mean () => { result = a; } meaning I need to close over result and a to make that work. For those of you that are lucky enough to not know how closures are implemented, it compiles into something like this:

public class Closure0<T> {
  public bool conditional;
  public Func<T> a;
  public Func<T> b;
  public Func<T> result;

  public void Implementation0() {
    result = a;
  }
}

public static T ternary(bool conditional, Func<T> a, Func<T> b) {
  Closure0<T> scope = new Closure0<T>();
  scope.conditional = conditional;
  scope.a = a;
  scope.b = b;
  
  scope,result = b;
  if (scope.conditional, scope.Implementation0);
  return scope.result;
} 

Which is gross on the best of days. But you'll also note that now we've got class level generics again. And again some Ts are function level generics and some Ts are class level generics. Worse yet, they change depending on if you're calling the closure variable from the closure scope or from the function scope. Bug city.

The good news is that the simple examples above now work. Also, I've gotten enough interop working that I can actually make .NET Lists and add things to them. That required a small change to the language to support explicit generic arguments in functions. That allows stupid code like this:

foo (T) => void {
  print "foo"
}

entrypoint => void {
  foo int

}

But also allows new List<int> to be a phrase that the language can interpret. Next on the docket is interfaces. The interface code I tried to do for the readme doesn't work, I can't use IEnumerable, and I certainly can't make Tangent types work with .NET interfaces or .NET types with Tangent interfaces. 

Saturday, September 10, 2016

The Cost of Pragmatism

Sorry for the delay in posting. As promised, I moved towards a bit more pragmatic focus with the language. Towards that goal, I made a tiny program:

a random number => int {
  new .NET System.Random.Next;
}

a random number less than (exclusive maximum : int) => int {
  new .NET System.Random.Next exclusive maximum;
}

a random number between (inclusive minimum : int) and (exclusive maximum : int) => int {
  new .NET System.Random.Next inclusive minimum exclusive maximum;
}

get a number from console => int {
  .NET System.Console.Write "Enter a number between 0 and 100: ";
  : input : string := .NET System.Console.ReadLine;
  int.Parse input;
}

provide user feedback based on (guess : int) and (target : int) => void {
  if guess = target 
    .NET System.Console.WriteLine "Correct!"
  else
    if guess < target
      .NET System.Console.Write "Sorry, your guess was too low. "
    else 
      .NET System.Console.Write "Sorry, your guess was too high. "
}

entrypoint => void {
  : target : int := a random number less than 100;
  : guess : int := 101;

  while not guess = target {
    guess = get a number from console;
    provide user feedback based on guess and target;
  }
}

A tiny console guessing game. Yes, that's what I've been working on for 2 months... At the end of the last post I hinted at performance problems with simple code with .NET imports. Turns out, that was a pretty big deal. The program above took more than 14 hours (I got tired of waiting, and killed it) to compile 2 months ago. 

The problem was in the core of the language. The way Tangent works is to parse all of the declarations, and then use them to infer meaning. That inference engine was pretty straightforward. Assuming a list of tokens coming in, a target type, and a set of rules:

  if tokens.count == 1 && tokens[0].effective_type == target_type
    return tokens;
 
  for index = 0; index < tokens.count; ++index
    foreach rule in ruleset

      if rule matches tokens at index
        replace tokens with rule result, and recurse

There's some extra-cruft in there for handling ambiguity, but the core of it is a shift-reduce parser in spirit. And that is great when you have a few dozen rules and 5-10 inputs. Now with tens of thousands of rules and larger inputs, big-O reminds you who is boss.

What I've done so far is to remove the linear search from the ruleset. It now is a bit smarter about limiting the legal rules based on the first token it sees, and then the second, and so on. I've also removed sum-types from the language. As soon as things could be A or B, it started exploding the search combinations again due to how the language is setup. Since I only really wanted them for nullable types, and I realized I couldn't really use them for nullable types (since Optional<A> and Optional<B> aren't related, even if A and B are) they got dropped. Pragmatism first.

Problem 2 was with my handling of closures. If you remember, while loops are not part of the language - they're defined in code as:

  while (condition: ~>bool) (body: ~>void) => void {
      if condition { body; while condition body; };
  }

Where ~> is the type operator to make things lazily evaluated. This means that not guess = target and the braces are their own separate delegates. And when I made them, I was closing around the locals twice, copying them in each time. So when I got the input from the console, it went into a different closure than the while conditional. So I got to take the slapdash closure code I hacked up ages ago and make it work properly. Oh, and I needed to figure out what "properly" was, since there aren't exactly a whole lot of good tutorials on the mechanics of implementing closures.

Problem 3 was smaller thankfully. Since numbers can't be part of identifiers in Tangent, I couldn't call Int32.Parse. Oops. I just hacked that in manually for now.

Problem 4 was a standard sort of bug. When running the game, it would tell you when you guessed the number, but wouldn't exit. I figured it was another closure issue, or perhaps something with tail calls in delegates. No, I had just implemented not as the CIL opcode not, which is bitwise not, which happily worked on (bool)1, yielding true again.

Next is working on interface interop, performance improvements, or some other pragmatic step forward.

Wednesday, July 20, 2016

Basic Interop

(For those wondering, I did get delegate fields working. They were not too much trouble. You can read about them in the release notes.)

As promised, I'm taking a bit of a shift towards pragmatism in my work, meaning some efforts to do interop with existing .NET code. Doing that comes with some design decisions and coding challenges. Let's go through some of them, since they'll impact most anyone writing a compiler:


How can I do .NET interop without making it neigh impossible to target other languages?


Well, the back end of the compiler (the bits that take structured intermediate forms and generate CIL from them) already are highly coupled with .NET. That code can know all it wants to about .NET. The front end though (the bits that take text and turn it into structured intermediate forms) doesn't know much about .NET. Even the built in types are just placeholders that the back end uses to tie to real ints and strings and the such.

But because of how Tangent works, the parser needs to see all of the legal phrases so it can infer meaning in the code. That means I need something from .NET interop land to generate the phrases for the Tangent code to refer to. And that something needs to carry enough information so that the back end can turn the phrase back into .NET interaction.

In the end, I erred towards YAGNI. I mean, I'll be lucky if I get a dozen people to read this blog, let alone write code in the language. There's no need to support shifting target languages. But the engineer in me can't just couple that code meaninglessly, so it ended up as a mostly separate set of pieces with some interop agnostic interfaces to tie it into the front end. The code that imports .NET code into the front end is just sitting in the front end for ease of implementation, but is mostly separate.

How can I deal with reference types when my language doesn't have null?


I'd really like it to not have the nulls ever, but .NET interop is going to require it somehow. I could make all reference types Optional<T> sort of things, which makes the null very explicit, but the interop code verbose and unweildy. I could make the .NET types nullable, but not Tangent types which kinda works, but will carry over the problems of null and add the leaky abstraction that Tangent types behave differently from interop types. I could just not allow Tangent code to assign or check null, using exceptions to handle nullref... but that is terrible.

In the current implementation, I mostly went the optional route. Constructors return real types, this-params take real types, but all parameters and return values are Optional<T>, requiring explicit null handling in Tangent. It might suck.

How can I deal with subtyping when my language doesn't have it?


Tangent only has interface implementation sorts of subtyping. The compiler doesn't support it, and since the inference engine needs to know that sort of stuff to know what a "legal" interpretation of a phrase is, the information needs to get there somehow. I could add subtyping, but adding something to the language just for interop seems dirty. I could not do subtyping and force explicit casts, but that'd be obnoxious. The tentative plan is to implement conversion functions so that the language can shift types upwards, and have the implementation be some sort of no-op that the back end compiler can deal with. That still will require some sort of inlining so that I'm not kneecapping performance due to my own stubbornness. (More on that later).

The basics are there now though. I need to get it wired into the command line interface, but in the test code, you can specify a collection of assemblies which the code will import the best it can. Non-generic types are largely there. Field access/assignment, properties, functions, constructors, extension methods - all there for instance and static forms. This test code will now compile and run properly with the right imports:

entrypoint => void {
  print "foo".Length;
  print "foobar".Length;
}

Which is awesome. No need to manually specify bindings to .NET functions any more, and the back end work was dead simple.

The bad news is that the above example takes about 50 seconds to compile and run. The other examples I've posted, including the ones doing weird partial specialization took about 200 milliseconds. The addition of a few thousand phrases for the inference code to wade through will do that. It's not terribly surprising, but disappointing none the less.

I guess those optimizations aren't premature anymore...

Sunday, June 19, 2016

Fields

The next step after adding local variables was to extend that concept along to actual fields in classes. Not the sexiest feature in the world, but as the language pushes towards doing stuff, pragmatism will win. Fields (hopefully) look and act like you'd expect - they look like local variables without the preceding colon, and initialize when the class is created. Let's look at the test code:

foo :> new Foo(x: int) {
  (this).test : int := 42 + x;
  (this) => int { this.test } 
}

entrypoint => void {
  print new Foo 5;     // 47
  :x:foo := new Foo 5;
  print x;             // still 47
  x.test = 5;
  print x;             // 5
} 

The only weird part is that (this) bit in there. Since fields can be phrases like everything else in the language, you need to specify where the owner of the field is fetched from. The initializers can pull from the constructor parameters, and from other fields - though there is a bug currently where the compiler may not order the dependencies of initializers properly.

Slightly weird, but nothing ground breaking. You have fields, you can read from them and you can write to them. Now to something a little more weird. I also added support for fields to interfaces. They don't have initializers, but otherwise use the same syntax. Let's look at an example:

Pirate :> pirate (x: string) :< has name {
  (this).name : string := x;
}

Doctor :> doctor (x: string) :< has name {
  (this).name : string := x;
}

has name :> interface {
  (this).name : string;
}

print name of (x: has name) => void {
  print x.name;
}

entrypoint => void {
  print name of doctor "?";
  print name of pirate "Bob"; 

}

Here the interface allows unified access to the fields of two otherwise distinct types. Under the hood, this works because while I've been using "field" to refer to fields, they are implemented similarly to properties. Each field has a getter and a setter function, so the interface simply requires they be implemented. In the example above, we're only using the getter, so could be changed to this and still work:

Pirate :> pirate (x: string) :< has name {
  (this).name : string := x;
}

Doctor :> doctor (x: string) :< has name {
  (this).name : string := x;
}

has name :> interface {
  (this).name => string;
}

print name of (x: has name) => void {
  print x.name;
}

entrypoint => void {
  print name of doctor "?";
  print name of pirate "Bob"; 
}

Next on the list is delegates. Once they're in, the language should be mostly done I think. It will have all of the big language things you need to do stuff, along with a pile of bugs and none of the library/interop things necessary for you to do interesting stuff. Hopefully, delegates will go quickly and I'll figure out some project that will push the pragmatism agenda.

Saturday, June 4, 2016

Basic Local Variables

After this morning's work, Tangent now supports the most basic of local variables. Yes, the language had user defined syntax, delegates, generics, type-class style interfaces, and... no variable support. But now as I move my focus more from building cool stuff to using cool stuff to build other cool stuff, I need to fill in the gaps. That means fixing the inevitable bugs I find, and basic usability stuff like local variables, which are still going to cause trouble since there wasn't any mutable state at all in the language before today.

I wanted local variables to be able to be phrases, just like parameters and types and everything else in the language. Yet I needed them to be clear both to the compiler and the user that they were declarations and not statements, and I didn't have a lot of tokens to work with, since I want most of them to be available for users to build their own phrases with. Let's look at the test code:

entrypoint => void {
  :x:int := 42;
  print x;
  x = x + 1;
  print x;
}

So local variable declarations start with a colon, and then work like (I hope) you'd expect them to - you declare a phrase and its type. Right now, locals require initialization, which uses pascal-style assignment to differentiate it from `=`, which helps the parser (and user) avoid ambiguity. This code works as of today.

What does not work is capturing locals in lambdas. That will just throw you an exception right off. Nested scopes probably don't work. Variable hiding probably works, but is untested. And due to implementation details, this code probably compiles, but will do unknown and terrible things:

entrypoint => void {
  print x;
  :x:int := 42;
}


Because while the code is smart enough to initialize the locals at the right time, it isn't smart enough to only make them available to the scope when declared, mostly because the scope is immutable and I didn't want to rabbit hole it. 

I have not yet decided if I hate this enough to fix it.

In other usability news, I added support to the command line compiler and test suite to support multiple input files. This should let me reuse code for different programs. I expect that I'm not going to want to reimplement conditionals and loops for every program I write...

Sunday, May 29, 2016

Standalone Interface Bindings

Hello again. In Interfaces - Part 4, I talked through the basic interface support which looks mostly like C# and other popular languages, with a little more functionality. You can declare an interface, and when you declare a class, you can say that it implements that interface. Now for some things that are not like what is available in popular languages.

One of the big things about Haskell type classes is that they don't work like that. They don't require you to include the interface reference when declaring your type. Let's look back at the example from Part 4:

convertible to string :> interface {
  (this) => string;
}

Cow :> cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;
}

The second declaration is our type declaration for Cow, along with our inline interface binding. With today's check-in you can do standalone interface binding as well:

convertible to string :> interface {
  (this) => string;
}

Cow :> cow { }

Cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;
}

Here, Cow is tied to convertible to string in its own standalone interface binding declaration, along with the implementation for that interface. This looks and acts a lot more like Haskell type classes. With this, you can take classes defined in some other library and adapt them to your interfaces (or vice versa).

Saturday, May 21, 2016

Interfaces - Part 4 - Compilation

(Part 1Part 2, Part 3)

The last part of implementing basic interfaces is the back end compiler. This part of the code takes the intermediate representation we made in Part 3 and turns it into CIL that actually runs.

There's two parts to this, creating the actual .NET type that represents our interface, and making the functions that properly dispatch to the right implementation. What is interesting about implementing interfaces like type-classes is that the two parts work like things the language already supports, but the two parts resemble different things.

The .NET type compilation used the sum-type compilation, similar to the intermediate code from part 3. All I needed was a few tweaks to allow the "what are all of the types this could be?" part of the code to get that from the sum-type directly or from the interface mapping. Oh, and allowing sum-types to have only one implementation in the runtime (since interfaces may have only one implementation, even if sum-types can't).

The function compilation on the other hand used the dispatch logic for generics. If you remember, interface functions infer their correct type based on the location of the (this) parameter. But since the type coming in is guaranteed to be a sum-type, the generic dispatch needs to check the type of the stored variant, not the sum-type itself.

Wrap that all together, and my basic test code now compiles and runs properly:

convertible to string :> interface {
  (this) to string => string;
}

Cow :> cow :< convertible to string {
  (this) to string => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x to string;
}

entrypoint => void {
  cow speak;
}

Hopefully this is straightforward - the code first declares an interface, which requires a single function. It then declares a class Cow with nullary constructor cow and implements the interface. And then it declares a function which works with that interface to print something to the screen so we can tell if it works. Finally, the entry point ties it all together by invoking that function with a concrete instance of Cow. Super dumb test code.

"So what?" you say, "Any plain interface implementation can do that."

That is entirely true. So let's covert this over to something that .NET style interfaces can't do. All the interface needs is some way to convert the object to a string. So why not just declare that?

convertible to string :> interface {
  (this) => string;
}

Cow :> cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;

}

Yeah, that works too. Here our convertible to string interface requires its implementors to supply an implicit conversion to string. Since print requires a string as its input, the language is smart enough to infer that you want the conversion to happen there.

Next on the list is non-inline interface binding, fancier examples, and .NET interop. I'm not sure which will come first - we'll see where whimsy takes me.