As promised, I'm taking a bit of a shift towards pragmatism in my work, meaning some efforts to do interop with existing .NET code. Doing that comes with some design decisions and coding challenges. Let's go through some of them, since they'll impact most anyone writing a compiler:
How can I do .NET interop without making it neigh impossible to target other languages?
Well, the back end of the compiler (the bits that take structured intermediate forms and generate CIL from them) already are highly coupled with .NET. That code can know all it wants to about .NET. The front end though (the bits that take text and turn it into structured intermediate forms) doesn't know much about .NET. Even the built in types are just placeholders that the back end uses to tie to real ints and strings and the such.
But because of how Tangent works, the parser needs to see all of the legal phrases so it can infer meaning in the code. That means I need something from .NET interop land to generate the phrases for the Tangent code to refer to. And that something needs to carry enough information so that the back end can turn the phrase back into .NET interaction.
In the end, I erred towards YAGNI. I mean, I'll be lucky if I get a dozen people to read this blog, let alone write code in the language. There's no need to support shifting target languages. But the engineer in me can't just couple that code meaninglessly, so it ended up as a mostly separate set of pieces with some interop agnostic interfaces to tie it into the front end. The code that imports .NET code into the front end is just sitting in the front end for ease of implementation, but is mostly separate.
How can I deal with reference types when my language doesn't have null?
I'd really like it to not have the nulls ever, but .NET interop is going to require it somehow. I could make all reference types Optional<T> sort of things, which makes the null very explicit, but the interop code verbose and unweildy. I could make the .NET types nullable, but not Tangent types which kinda works, but will carry over the problems of null and add the leaky abstraction that Tangent types behave differently from interop types. I could just not allow Tangent code to assign or check null, using exceptions to handle nullref... but that is terrible.
In the current implementation, I mostly went the optional route. Constructors return real types, this-params take real types, but all parameters and return values are Optional<T>, requiring explicit null handling in Tangent. It might suck.
How can I deal with subtyping when my language doesn't have it?
Tangent only has interface implementation sorts of subtyping. The compiler doesn't support it, and since the inference engine needs to know that sort of stuff to know what a "legal" interpretation of a phrase is, the information needs to get there somehow. I could add subtyping, but adding something to the language just for interop seems dirty. I could not do subtyping and force explicit casts, but that'd be obnoxious. The tentative plan is to implement conversion functions so that the language can shift types upwards, and have the implementation be some sort of no-op that the back end compiler can deal with. That still will require some sort of inlining so that I'm not kneecapping performance due to my own stubbornness. (More on that later).
The basics are there now though. I need to get it wired into the command line interface, but in the test code, you can specify a collection of assemblies which the code will import the best it can. Non-generic types are largely there. Field access/assignment, properties, functions, constructors, extension methods - all there for instance and static forms. This test code will now compile and run properly with the right imports:
entrypoint => void {
print "foo".Length;
print "foobar".Length;
}
Which is awesome. No need to manually specify bindings to .NET functions any more, and the back end work was dead simple.
The bad news is that the above example takes about 50 seconds to compile and run. The other examples I've posted, including the ones doing weird partial specialization took about 200 milliseconds. The addition of a few thousand phrases for the inference code to wade through will do that. It's not terribly surprising, but disappointing none the less.
I guess those optimizations aren't premature anymore...