Monday, December 31, 2012

The wide ranging impact of design decisions - Free Functions

I talked about this problem a little bit in my last post, but wanted to elaborate on it a bit since it's an interesting lesson learned. The issue in question is the general desire to allow free functions to satisfy interface requirements in Tangent. Say you had a third party library that defines a simple connection interface:
Connection => class {
  Open => void;
  Close => void;
  Read => IEnumerable<byte>;
};
But you wanted to use it with a different third party library that provided a connectionless version:
Reader => class {
  Read => IEnumerable<byte>;
};
Sadly, you can't just inherit from Reader. It sits off in some 3rd party library and you don't have control over how it gets instantiated. In these cases, you'll end up having to use some icky wrapper to make stuff work. Tangent makes this better, but still a lot of boilerplate:
adapt (r: Reader) to Connection => Connection with class {
  Open => void {}
  Close => void {}
  Read => IEnumerable<byte> { return r.Read; }
};
So since the language allows free functions, this should work:
(r: Reader).Open => void {}
(r: Reader).Close => void {}
Since Reader can now satisfy the interface needed by Connection everything should be good. For this (perhaps poor) example, it's not a clear win for the free function version. But consider cases with multiple parameters as is often the case with infix operators. It quickly becomes less clear how to do the adaptation for each of the different parameters involved; as well as who should own the extension. Why doesn't this just work? Because the Open and Close extensions might not be visible in certain scopes. Once that happens, the type checker can be incorrect; verifying that Reader satisfies Connection in one scope and then once it's passed into a function, it suddenly doesn't. Beyond that, this general idea that free functions can satisfy interfaces had a much larger implication I had missed. It means the subtyping check cannot simply accept two types as its inputs anymore. Even if the types have a list of functions that they require, those functions aren't the only ones that can satisfy an interface. Indeed, any function that works with the type can satisfy an interface we're checking. This is nifty and powerful, but means the compiler actually has to do it. But that's not all. Consider:
A: class {
  foo => void;
};

B: class {
  bar => void;
};

C: class {};

(b: B).foo => void {}
(c: C).bar => void {}
Is C a subtype of A? Sure. It ends up satisfying those constraints, but the sub-type function actually needs to handle this case, including the sort of loops it can get into because to see if a type applies, it needs to see if a method applies, which causes it to see if a type applies, which... So when people ask why aren't user defined operators more common, I think of this sort of rabbit hole that a design decision leads to and cannot help think that people far smarter than I knew this decades ago which led to languages (by and large) not mixing infix operators and objects.

Saturday, November 3, 2012

Showstopper!

Sorry about the lack of updates, but I've been busy at a new job and discovered a showstopper with the language design. The issue has to do with free functions, scopes, and the type system. I want to allow extension methods to count for a given type. That is, if you want to consume a type from a 3rd party, but it doesn't quite fit into the interface of another 3rd party interface, you should be free to add an extension method to the type to fit it into the interface. Alas, this leads to tons of trouble once the extension method lives in different scopes from the type it extends, the interface its trying to fit, and/or the methods that actually require you meet the interface. Since that was one of the core things that the module system was meant to address, it's probably going to be junked and/or revisited. Not sure what it's going to be yet; I might go with a similar system, but limit the impact of free functions to typing (bleh) or constrain the module system to better allow this sort of behavior (more likely). But it's a matter of defining what I want, what tradeoffs are acceptable, and then doing a better job vetting the correctness of the design.

Sunday, July 22, 2012

Phrase Building

Tangent compilation is troublesome compared to traditional languages. The first place this shows up is in Phrase Building. Once the source is parsed, pretty much everything is referred to by a phrase. These phrases need to be curried such that they're boiled down into unary functions that return other functions, on downward until you get to the end of the phrase. In this way, Tangent can work with them in a nice consistent form.

It is one of the core pieces of Tangent though, and over all of its iterations, the implementation has gotten far cleaner and robust. This time though, things are broken up enough to make testing at least mildly sane:



[TestMethod]  
public void ComplexExample() {  
  var parsedSource = Parser.Parse(
    "test", 
    @"(param b:int) (T: kind of any) ++foo (c:string) bar baz (x:int) 
:: module {};"
  );  
   
  var analyzer = new TypeAnalysis_Accessor(
    new List<TypeDefinition>() { parsedSource });  
  var anchor = analyzer.FindAnchorIn(parsedSource.Phrase);  
  var result = analyzer.DeterminePhraseOrder(
    parsedSource.Phrase, 
    anchor.Value
  );  
   
  Assert.IsTrue(
    new List<int>() { 2,3,4,1,0,5,6,7,8 }
     .SequenceEqual(result));  
}  

Wednesday, July 4, 2012

Document update

With the module changes discussed in the previous post, the spec has been updated to include the changes. Slight modifications include:

  • (this). is now optional for field declarations.
  • Initializers are now required for fields except for:
    • Non-abstract types.
    • Enums (the first value is used).
    • Delegates that result in void (a no-op anonymous method is used).
  • Enums may now inherit from other enums. The values list of the sub-enum may not contain values that are not part of the base-enum.
  • Syntax for inferring generic types have been added to all phrases that take parameters (which includes classes and modules). 

Wednesday, June 27, 2012

Modules (how things you ignore bite you in the ass)

Unfortunately, since my last bit of progress I realized that something I wanted to do was broken. One thing Tangent should be able to do is if you have some serialization library:

serializable => abstract class {
  serialize (this) => string;
};


// ... 

Which requires its serializable types to implement some basic function to take an instance and return a string. Now assume you have some type T in a different library and need to glue them together:

serialize (instance: T) => string { ... };

You should be able to just specify an implementation and since T now satisfies the interface, it should be considered serializable. Well it would; sometimes. Since T and the method could exist in different namespaces (and likely exist in different DLLs) it wouldn't always... 

And honestly, it would be one of those largely horrible bugs to track down about why a type is sometimes a sub-type and sometimes not. Further, since that relation isn't constant it restricts how much caching can be done.

I largely ignored namespaces, simplifying my view of things to be one big flat scope to track less things. Once again "those hairy things that are best thought about later" (tm) turn out to cause problems. They usually do.


So after doing some research, I'm aiming to provide a module system for Tangent. It will be similar to some of the more modern implementations like Scala's. Modules take the place of namespaces, but are not static. In Tangent, they will behave almost identically to classes. You can create separate instances of them (likely useful for giving threads their own sandbox). You can supply parameters to them (configs, DI, parameterization on type). You can use them as parameters (DI, runtime behaviors). You can, and likely will mix them together (fill in partial behaviors, provide specialization). The only difference is that modules are effectively partial, which will likely lead to a few limitations due to composition order issues.

I expect that I'm missing a lot of the nuance with these systems having not used one myself. I expect that a few of these features are grand ideas that suck (or are not viable) in practice. 

This also requires a slight syntax change. The declaration arrow (=>) is now used only for methods (since there's no way to distinguish abstract methods from type declarations if the syntax can have both in the same place (which modules allow). So type declarations now use a type declaration symbol (::) between the phrase and the implementation. Modules themselves will be declared like classes, except with module rather than class (for now).

I hope to have the specification fixed to account for these changes, and then fix the parser and tests in the near future. Stay tuned.
 
 

Tuesday, May 22, 2012

Breaking Ground

The documentation has gotten to a point where the things that are well known are done, and the things that are not will not get much better writing documentation. So I have broken ground on (yet another) iteration of the language. The implementation focus for this one will be testability and doing only what the requirements specify. I am also going to focus on stable releases; not that anyone downloads them... That said, I've completed the first release of the new iteration:

TangentCompiler.20120522.zip

The exe is a simple command line interface for the parser. Pass in a filename and see if it parses correctly. Error messages exist, but are not very good in a number of places. That and there are only a few places where parsing can fail in Tangent before the type interpretation occurs.


One departure during this iteration is that I am hand-rolling my own lexer/parser for the language. Why? Yes, it's because I suck. Oh, why did I choose to do it you mean? Glad you asked.

In the previous few iterations I used the parsing framework to do the language parsing. It was good enough. Unfortunately, its output was a generic untyped syntax tree. A good amount of busy-work (and errors) in the previous iterations was walking the list, verifying children and transforming it into something usable. By rolling a specific parser, the result is tightly coupled to the language meaning everything I expect to be there is there. All the different pieces of the syntax have good names and correct types. Ideally this will provide cleaner code where the parsing results are being consumed, making the implementation less complicated and thus more likely to advance successfully.

Wednesday, April 11, 2012

Document milestone

The requirements document is up around 50 pages and is not even really close to being done. Some things changed, some things have new names, some things were likely forgotten.

Tangent Programming Language Specification - 4/11/12

Thursday, March 8, 2012

Aren't you supposed to do requirements first?

It has been a while since posting. A while since working on Tangent really. The project ran into a rather large, hard to find blocking bug. Combined with work and the length it takes to get up to speed again in between coding sessions, nothing got done.

After some consideration and discussion with peers the project is going to take a few steps back. There is going to be some requirements defined to help with the 'up to speed' problem. They will also help expand the automated tests, and aid refactoring the code to something less complex. More tests, less complexity should lead to easier debugging.

The current requirements doc can be found here. At time of writing, the introduction and language overview is done. Nitty gritty specification and grammar are yet to come. As always, feedback of any sort is welcome.