Monday, December 15, 2014

Progress - Expression Parsing

Work continues. After an aborted first attempt that did not deal with ambiguity well, and a bout of coder's block, the first swipe at the second stage of the parser is dev-done. The code can (read: should) now parse statements when given an arbitrary identifier stream and a scope. Scopes are simple now. No nesting. Just type constants (which aren't used yet), parameter declarations and functions in separate bundles to keep things simple.

Tests exist for:
  • step from identifiers to parameter reference
  • step from identifiers to function binding
  • step from function binding to function invocation
  • function consumption of non-terminals (values of a given type)
  • priority given to parameters when ambiguous
  • use of lower priority when it can successfully parse
Not that it actually does anything yet. All the code does is take identifiers and give you a parse tree for that statement. More than one parse tree actually if the statement is ambiguous in the current scope. It also likely behaves poorly when there are cycles in the grammar/function declarations.

Still, I am happy to have the core that makes Tangent different there and working. And happy that progress continues

Saturday, December 6, 2014

Progress - Initial Iteration

The first iteration is progressing. As much as I would like type declarations and functions be very similar, they are currently very distinct. This is for simplicity and the whole "don't get fancy" focus. The current language is dead simple (not lisp dead simple, but dead simple compared to what I'd like eventually):

program ::= (type-decl | function-decl)*
type-decl ::= id+ :> enum { comma-delimited-id-list
function-decl ::= phrase-part+ => id+ { statement* }
phrase-part ::= id | parameter-decl
parameter-decl ::= '(' id+ : id+ ')'
statement ::= id+ ;

The rest of the work is done in the second step of the parser, the part that we're focusing on this time around. That part takes the various statements and performs bottom-up parsing on them using the type declarations from the first (conventional) step of the parser. The void type is used to define the start rule for statements, and other rules will need to be used to help deal with potential ambiguities (like preferring an id literal over a parameter reference).

Otherwise, it should proceed pretty much like any other bottom-up parsing approach, albeit providing an extensible syntax while demanding nothing more from programmers than they're used to - declaring and annotating types. Currently, the first step of the parser is dev-done. 

Wednesday, November 26, 2014

Once more unto the breach

It has been a while.

After the last failure, work (and school) showed up in force. I did not have much time for Tangent, or more interesting things for that matter. But now work has died down, as has class. I've (finally) picked up the Dragon Book, and realized that much of what I was doing with Tangent types was very similar to the use of non-terminals in formal grammars. I asked a question about it on StackExchange, which... didn't really go as hoped. Though it did remind me of the vast gap between professionals and academics.

Which in turn provided a bit of motivation to get off my ass and figure things out on my own. So I've started in on yet another version of Tangent - this time on GitHub. And this time, the approach will be to have an exceptionally minimal end-to-end implementation that is built upon. First step is do enough to declare symbols, declare (unary) functions, parse expressions using the type info, and debug-print symbols. No subtyping. No records. No type operators. No strings, no ints, no bools.

Just enough to implement the core extensible syntax successfully. Don't get fancy. Don't get greedy.

Monday, December 31, 2012

The wide ranging impact of design decisions - Free Functions

I talked about this problem a little bit in my last post, but wanted to elaborate on it a bit since it's an interesting lesson learned. The issue in question is the general desire to allow free functions to satisfy interface requirements in Tangent. Say you had a third party library that defines a simple connection interface:
Connection => class {
  Open => void;
  Close => void;
  Read => IEnumerable<byte>;
};
But you wanted to use it with a different third party library that provided a connectionless version:
Reader => class {
  Read => IEnumerable<byte>;
};
Sadly, you can't just inherit from Reader. It sits off in some 3rd party library and you don't have control over how it gets instantiated. In these cases, you'll end up having to use some icky wrapper to make stuff work. Tangent makes this better, but still a lot of boilerplate:
adapt (r: Reader) to Connection => Connection with class {
  Open => void {}
  Close => void {}
  Read => IEnumerable<byte> { return r.Read; }
};
So since the language allows free functions, this should work:
(r: Reader).Open => void {}
(r: Reader).Close => void {}
Since Reader can now satisfy the interface needed by Connection everything should be good. For this (perhaps poor) example, it's not a clear win for the free function version. But consider cases with multiple parameters as is often the case with infix operators. It quickly becomes less clear how to do the adaptation for each of the different parameters involved; as well as who should own the extension. Why doesn't this just work? Because the Open and Close extensions might not be visible in certain scopes. Once that happens, the type checker can be incorrect; verifying that Reader satisfies Connection in one scope and then once it's passed into a function, it suddenly doesn't. Beyond that, this general idea that free functions can satisfy interfaces had a much larger implication I had missed. It means the subtyping check cannot simply accept two types as its inputs anymore. Even if the types have a list of functions that they require, those functions aren't the only ones that can satisfy an interface. Indeed, any function that works with the type can satisfy an interface we're checking. This is nifty and powerful, but means the compiler actually has to do it. But that's not all. Consider:
A: class {
  foo => void;
};

B: class {
  bar => void;
};

C: class {};

(b: B).foo => void {}
(c: C).bar => void {}
Is C a subtype of A? Sure. It ends up satisfying those constraints, but the sub-type function actually needs to handle this case, including the sort of loops it can get into because to see if a type applies, it needs to see if a method applies, which causes it to see if a type applies, which... So when people ask why aren't user defined operators more common, I think of this sort of rabbit hole that a design decision leads to and cannot help think that people far smarter than I knew this decades ago which led to languages (by and large) not mixing infix operators and objects.

Saturday, November 3, 2012

Showstopper!

Sorry about the lack of updates, but I've been busy at a new job and discovered a showstopper with the language design. The issue has to do with free functions, scopes, and the type system. I want to allow extension methods to count for a given type. That is, if you want to consume a type from a 3rd party, but it doesn't quite fit into the interface of another 3rd party interface, you should be free to add an extension method to the type to fit it into the interface. Alas, this leads to tons of trouble once the extension method lives in different scopes from the type it extends, the interface its trying to fit, and/or the methods that actually require you meet the interface. Since that was one of the core things that the module system was meant to address, it's probably going to be junked and/or revisited. Not sure what it's going to be yet; I might go with a similar system, but limit the impact of free functions to typing (bleh) or constrain the module system to better allow this sort of behavior (more likely). But it's a matter of defining what I want, what tradeoffs are acceptable, and then doing a better job vetting the correctness of the design.

Sunday, July 22, 2012

Phrase Building

Tangent compilation is troublesome compared to traditional languages. The first place this shows up is in Phrase Building. Once the source is parsed, pretty much everything is referred to by a phrase. These phrases need to be curried such that they're boiled down into unary functions that return other functions, on downward until you get to the end of the phrase. In this way, Tangent can work with them in a nice consistent form.

It is one of the core pieces of Tangent though, and over all of its iterations, the implementation has gotten far cleaner and robust. This time though, things are broken up enough to make testing at least mildly sane:



[TestMethod]  
public void ComplexExample() {  
  var parsedSource = Parser.Parse(
    "test", 
    @"(param b:int) (T: kind of any) ++foo (c:string) bar baz (x:int) 
:: module {};"
  );  
   
  var analyzer = new TypeAnalysis_Accessor(
    new List<TypeDefinition>() { parsedSource });  
  var anchor = analyzer.FindAnchorIn(parsedSource.Phrase);  
  var result = analyzer.DeterminePhraseOrder(
    parsedSource.Phrase, 
    anchor.Value
  );  
   
  Assert.IsTrue(
    new List<int>() { 2,3,4,1,0,5,6,7,8 }
     .SequenceEqual(result));  
}  

Wednesday, July 4, 2012

Document update

With the module changes discussed in the previous post, the spec has been updated to include the changes. Slight modifications include:

  • (this). is now optional for field declarations.
  • Initializers are now required for fields except for:
    • Non-abstract types.
    • Enums (the first value is used).
    • Delegates that result in void (a no-op anonymous method is used).
  • Enums may now inherit from other enums. The values list of the sub-enum may not contain values that are not part of the base-enum.
  • Syntax for inferring generic types have been added to all phrases that take parameters (which includes classes and modules).