Tuesday, May 22, 2012

Breaking Ground

The documentation has gotten to a point where the things that are well known are done, and the things that are not will not get much better writing documentation. So I have broken ground on (yet another) iteration of the language. The implementation focus for this one will be testability and doing only what the requirements specify. I am also going to focus on stable releases; not that anyone downloads them... That said, I've completed the first release of the new iteration:

TangentCompiler.20120522.zip

The exe is a simple command line interface for the parser. Pass in a filename and see if it parses correctly. Error messages exist, but are not very good in a number of places. That and there are only a few places where parsing can fail in Tangent before the type interpretation occurs.


One departure during this iteration is that I am hand-rolling my own lexer/parser for the language. Why? Yes, it's because I suck. Oh, why did I choose to do it you mean? Glad you asked.

In the previous few iterations I used the parsing framework to do the language parsing. It was good enough. Unfortunately, its output was a generic untyped syntax tree. A good amount of busy-work (and errors) in the previous iterations was walking the list, verifying children and transforming it into something usable. By rolling a specific parser, the result is tightly coupled to the language meaning everything I expect to be there is there. All the different pieces of the syntax have good names and correct types. Ideally this will provide cleaner code where the parsing results are being consumed, making the implementation less complicated and thus more likely to advance successfully.

Wednesday, April 11, 2012

Document milestone

The requirements document is up around 50 pages and is not even really close to being done. Some things changed, some things have new names, some things were likely forgotten.

Tangent Programming Language Specification - 4/11/12

Thursday, March 8, 2012

Aren't you supposed to do requirements first?

It has been a while since posting. A while since working on Tangent really. The project ran into a rather large, hard to find blocking bug. Combined with work and the length it takes to get up to speed again in between coding sessions, nothing got done.

After some consideration and discussion with peers the project is going to take a few steps back. There is going to be some requirements defined to help with the 'up to speed' problem. They will also help expand the automated tests, and aid refactoring the code to something less complex. More tests, less complexity should lead to easier debugging.

The current requirements doc can be found here. At time of writing, the introduction and language overview is done. Nitty gritty specification and grammar are yet to come. As always, feedback of any sort is welcome.

Sunday, October 23, 2011

Code generation rework and cheap parlor tricks.

Far too long since my last post. I'd like to say I'd been hard at work making cool new features, but if you've been reading at all, you'll know how much of a lie that is. Plus now I'm sleepy, so instead of doing compiler debugging while sleepy I'll get the handful of non-robotic journal readers some amusement.

The big change going on recently is re-working how I was doing the dynamic dispatch in Tangent. The previous way involved defining a method on my runtime object, sending in a parameter, doing some inspection and then calling the right method out of a dictionary. Quick, easy, functional. But it rubbed me the wrong way (not that way), and was increasingly causing me grief when trying to plan out next steps. It's poor performance, it's difficult to optimize, it's easier to debug, but harder to incorporate native .NET calls. And it makes parameter aggregation painful.

So that is out, now replaced with actual bonafide compiled CIL. The compiler makes a few quick optimizations based on types, but mostly just brute force dispatches based on what the method overloads are for that instance. And on the upside, it now nicely throws exceptions if we get into a weird case where a call is ambiguous or unknown. And (the core reason behind the work since last post), methods actually can accept and curry parameters now. All of the examples before were simple methods or only required the type of the parameter, not the value. Alas, there's still some bugs there so even the previous code examples don't work at the moment.

But I'm going to talk a little bit about what's in the works since it's both nifty, and a horrible idea.

One of the main goals of Tangent is to allow unparalleled ease and flexibility in creating domain specific languages, or even more general purpose dialects that can interoperate and evolve as programmers need them to. The core syntax defines some syntax just to make the basics familiar to programmers. Function declaration, class declaration, type operations, assignment, return... But I'd like to keep the set of keywords (and reserved symbols) relatively small.

The experimentation recently has involved just how small can be reasonably provided to the programmer and still be able to provide via some core library that provides the dialect of Tangent that is used. Once these little bugs are taken care of, Tangent should be able to support in-code definition of your standard if/else block.

For those of you who have not fainted or are on your way to stop more of this insanity, here's some of the details. The first core bit I need is the dynamic dispatch supported by methods already. And I need to be able to define specialized symbols/identifiers in source. You've seen these before.

There are two new things that you've not seen. The first is the Lazy operator (which might get a better name later). It is a built-in type operator that asks for a fully bound method resulting in the specified type. Since every access in Tangent (currently) is done via curried method, this works pretty well. So in Tangent you can define a type like ~> void which is simply 'some expression that results in void'. Once a variable of that type is put into an expression where it needs to provide void the variable is executed. This sort of thing will be used many places where you'd normally pass a lambda, but will (ideally) lead to cleaner, more readable code at the call site.

The second concept is simply that the language will treat your standard block as a nullary closure. Which is just fancy terminology for method of type ~> void. It looks like a block you've used, it behaves like a block you've used, but it can be passed into methods and it can be stored in variables.

Using these pieces, we can define the standard if/else block in code (using an enum, since .NET bools aren't accessible yet):


Type bool => enum{ values{ True, False } };
if (condition: bool) (block: ~> void) => void { block; };
if (condition: bool) (block: ~> void) 
  else (else block: ~> void) => void { block; };
if (condition: bool.False.Type) (block: ~> void) => void { };
if (condition: bool.False.Type) (block: ~> void) 
  else (else block: ~> void) => void { else block; };
entrypoint => void {
  if bool.True {
    debug "True!";
  };
};

Untested of course, since the code gen is still buggy. Hopefully there's not some trivial bug, but I'm not sure anyone reads this anyways.

One interesting trick here is specializing the method on False. Why false? Because remember that null in Tangent is essentially the top type. If I specified True as the specialization, null would trigger that. If I specified both True and False, then sending in null would cause an AmbiguousInvocationException. This way, if you send in null, it behaves as False. And the scary/nifty thing? If your dialect wants different behavior, you can go in and define that.

Thursday, September 15, 2011

The dark depths of this beast.



First a warning for those with heart conditions, epilepsy, or are pregnant; this post contains code snippets that have maimed the hale and healthy. YOU'VE BEEN WARNED!

One of the things I noticed while doing the last bit of work is how often the core part of the compiler iterates over the same damned thing. It needs to take a sequence of tokens and figure out what order of execution is necessary so that they 'make sense'. It's the core idea of the language, and making it work well, reliably, and fast/scalably is going to be key to the thing being viable. And frankly, its the sort of thing that the experimental language is there to experiment with.

What I was doing was simply iterating through the collection of tokens, if I found one to reduce, I did the reduction and then via recursion worked on the now smaller collection. Eventually I'd try everything and it'd fail, or I'd find what I was looking for and exit out early.

That though starts at the beginning each time. And it doesn't take any sort of type info into account when reducing things to see if we're "closer" to our answer. And it provides poor support for partial consumption, like the implicit feature I discussed earlier (this was actually the downfall of the previous iteration).

So I spend the past few days thinking and tonight redoing that core bit of code. And I now have the most horrible C# I've ever seen:

 using System;  
 using System.Collections.Generic;  
 using System.Linq;  
 using System.Text;  
 using Tangent.Lang.Compilation.IntermediateForm;  
 using System.CodeDom.Compiler;  
 using System.Threading.Tasks;  
 using System.Collections.Concurrent;  
   
 namespace Tangent.Lang.Compilation {  
   public partial class Compiler {  
     private IEnumerable<List<List<Element>>> UnflattenElements(List<Element> flatElements, Type targetType, List<Type> scope, CompilerErrorCollection errors, ConsumeDirection consumeDirection) {  
   
       System.Diagnostics.Debug.WriteLine("[{0}] <- {1}", targetType, string.Join(":", flatElements));  
   
       if (targetType == Type.ExecutionToken) {  
         yield return new List<List<Element>>() { flatElements };  
         yield break;  
       }  
   
       if (flatElements.Count == 0) {  
         yield break;  
       }  
   
       Element test = null;  
       if (consumeDirection == ConsumeDirection.Right) {  
         test = flatElements.First();  
       } else {  
         test = flatElements.Last();  
       }  
   
       if (targetType.IsExplicitIdentifier) {  
         if (!string.IsNullOrWhiteSpace(test.Identifier) && targetType == LiteralTypeFor(test.Identifier)) {  
           List<Element> rework = new List<Element>(flatElements);  
           rework[consumeDirection == ConsumeDirection.Right ? 0 : flatElements.Count - 1] = new IdentifierLiteral(test.Identifier, LiteralTypeFor(test.Identifier));  
           yield return new List<List<Element>>() { rework };  
           yield break;  
         }  
       }  
   
       if (flatElements.Count == 1) {  
         if (flatElements[0].EffectiveType != null && flatElements[0].EffectiveType.IsSubTypeOf(targetType)) {  
           yield return new List<List<Element>>() { flatElements };  
           yield break;  
         }  
       }  
   
       if (test.EffectiveType != null && test.EffectiveType.IsSubTypeOf(targetType)) {  
         yield return new List<List<Element>>() { flatElements };  
       }  
   
       int start = consumeDirection == ConsumeDirection.Right ? 0 : flatElements.Count - 1;  
       int end = consumeDirection == ConsumeDirection.Right ? flatElements.Count : -1;  
       int step = consumeDirection == ConsumeDirection.Right ? 1 : -1;  
       for (int i = start; i != end; i += step) {  
         IEnumerable<List<MethodDefinition>> possibilityStack = null;  
         List<Element> currentResolutionState = null;  
         if (flatElements[i] is ReResolveElement) {  
           possibilityStack = ResolveIdentifier(flatElements[i].Identifier, scope, errors)  
             .Signal(elements => currentResolutionState=elements)  
             .Select(elements => elements.Select(element => element.EffectiveType.ReductionRules[0]).ToList());  
   
         } else if (flatElements[i].EffectiveType.ReductionRules.Any()) {  
           possibilityStack = new List<List<MethodDefinition>>() { new List<MethodDefinition>(flatElements[i].EffectiveType.ReductionRules) };  
         }  
   
         if (possibilityStack != null) {  
           foreach (var possibles in possibilityStack) {  
   
             while (possibles.Any()) {  
               HashSet<MethodDefinition> best = HighestPriorityMethods(possibles, consumeDirection == ConsumeDirection.Right ? ConsumeDirection.Left : ConsumeDirection.Right);  
               possibles.RemoveAll(def => best.Contains(def));  
               List<Element> paramElements;  
   
               if (best.First().Direction == ConsumeDirection.Right) {  
                 if (i == flatElements.Count - 1) {  
                   paramElements = new List<Element>();  
                 } else {  
                   paramElements = new List<Element>(flatElements.GetRange(i + 1, flatElements.Count - i - 1));  
                 }  
               } else {  
                 if (i == 0) {  
                   paramElements = new List<Element>();  
                 } else {  
                   paramElements = new List<Element>(flatElements.GetRange(0, i));  
                 }  
               }  
   
               Dictionary<MethodDefinition, IEnumerator<List<List<Element>>>> possibleParamPermutations = new Dictionary<MethodDefinition, IEnumerator<List<List<Element>>>>();  
               foreach (var entry in best) {  
                 possibleParamPermutations[entry] = UnflattenElements(paramElements, entry.Consumes, scope, errors, entry.Direction).GetEnumerator();  
               }  
   
               while (possibleParamPermutations.Any()) {  
                 bool permutationsLeft = false;  
                 do {  
                   permutationsLeft = false;  
                   List<List<Element>> passback = new List<List<Element>>();  
                   Dictionary<MethodDefinition, IEnumerator<List<List<Element>>>> possibleSolutions = new Dictionary<MethodDefinition, IEnumerator<List<List<Element>>>>();  
   
                   foreach (var entry in best.Where(def => possibleParamPermutations.ContainsKey(def))) {  
                       
                     if (possibleParamPermutations[entry].MoveNext()) {  
                       permutationsLeft = true;  
                       foreach (var paramPermutation in possibleParamPermutations[entry].Current) {  
                         possibleSolutions[entry] = UnflattenElements(Reduce(flatElements, i, paramPermutation, entry, currentResolutionState), targetType, scope, errors, consumeDirection).GetEnumerator();  
                         if (!possibleSolutions[entry].MoveNext()) {  
                           possibleSolutions.Remove(entry);  
                         }  
                       }  
                     } else {  
                       // no op; eventually permutationsLeft will remain false.  
                       possibleParamPermutations.Remove(entry);  
                     }  
                   }  
   
                   if (possibleSolutions.Any()) {  
                     List<MethodDefinition> solutionDefs = new List<MethodDefinition>(possibleSolutions.Keys);  
                     while (solutionDefs.Any()) {  
                       HashSet<MethodDefinition> bestSolutions = HighestPriorityMethods(solutionDefs, consumeDirection == ConsumeDirection.Right ? ConsumeDirection.Left : ConsumeDirection.Right);  
                       solutionDefs.RemoveAll(def => bestSolutions.Contains(def));  
   
                       List<MethodDefinition> cleanup = new List<MethodDefinition>();  
                       while (bestSolutions.Any()) {  
                           
                         foreach (var bestSol in bestSolutions) {  
                           System.Diagnostics.Debug.WriteLine(string.Format("({2}) {0} : [{1}]", bestSol, string.Join(" =||= ", possibleSolutions[bestSol].Current.Select(line => string.Join(":", line))), flatElements.Count));  
                           passback.AddRange(possibleSolutions[bestSol].Current);  
                           if (!possibleSolutions[bestSol].MoveNext()) {  
                             cleanup.Add(bestSol);    
                           }  
                         }  
   
                         bestSolutions.RemoveWhere(key=>cleanup.Contains(key));  
                         cleanup.Clear();  
                         yield return passback;  
                       }  
                     }  
                   }  
                 } while (permutationsLeft == true);  
               }  
             }  
           }  
         }  
       }  
     }  
   
     private List<Element> Reduce(List<Element> flatElements, int ix, List<Element> paramPermutation, MethodDefinition signature, List<Element> currentResolutionState) {  
       List<Element> result = new List<Element>();  
       Func<int, Element> getReductionElement = (i) => {  
         if (flatElements[i] is ReResolveElement) {  
           if (currentResolutionState == null) {  
             throw new ArgumentNullException();  
           }  
   
           return currentResolutionState.Where(e => e.EffectiveType.ReductionRules[0] == signature).First();  
         } else {  
           return flatElements[i];  
         }  
       };  
         
       if (signature.Consumes == Type.ExecutionToken) {  
   
         if (ix != 0) {  
           result.AddRange(flatElements.GetRange(0, ix));  
         }  
   
         result.Add(new ReductionElement(getReductionElement(ix), new Literal("", Type.ExecutionToken, null), signature));  
   
         if (ix + 1 != flatElements.Count) {  
           result.AddRange(flatElements.GetRange(ix + 1, flatElements.Count - ix - 1));  
         }  
   
         return result;  
       }  
   
       if (signature.Direction == ConsumeDirection.Right) {  
         if (ix != 0) {  
           result.AddRange(flatElements.GetRange(0, ix));  
         }  
   
         if (signature.Consumes.IsExplicitIdentifier) {  
           result.Add(new ReductionElement(getReductionElement(ix), new IdentifierLiteral(paramPermutation[0].Identifier, LiteralTypeFor(paramPermutation[0].Identifier)), signature));  
         } else {  
           result.Add(new ReductionElement(getReductionElement(ix), paramPermutation[0], signature));  
         }  
   
         if (paramPermutation.Count != 1) {  
           result.AddRange(paramPermutation.GetRange(1, paramPermutation.Count - 1));  
         }  
   
         return result;  
       }  
   
       if (signature.Direction == ConsumeDirection.Left) {  
         if (paramPermutation.Count != 1) {  
           result.AddRange(paramPermutation.GetRange(0, paramPermutation.Count - 1));  
         }  
   
         if (signature.Consumes.IsExplicitIdentifier) {  
           result.Add(new ReductionElement(getReductionElement(ix), new IdentifierLiteral(paramPermutation.Last().Identifier, LiteralTypeFor(paramPermutation.Last().Identifier)), signature));  
         } else {  
           result.Add(new ReductionElement(getReductionElement(ix), paramPermutation.Last(), signature));  
         }  
   
         if (ix != flatElements.Count - 1) {  
           result.AddRange(flatElements.GetRange(ix + 1, flatElements.Count - ix - 1));  
         }  
   
         return result;  
       }  
   
       throw new ApplicationException("Got to unreachable code somehow...");  
     }  
   
     private static HashSet<MethodDefinition> HighestPriorityMethods(List<MethodDefinition> candidates, ConsumeDirection preferredDirection) {  
       if (candidates.Count <= 1) {  
         return new HashSet<MethodDefinition>(candidates);  
       }  
   
       HashSet<MethodDefinition> best = new HashSet<MethodDefinition>();  
       best.UnionWith(candidates.Where(def => def.Consumes == Type.ExecutionToken));  
       if (best.Any()) {  
         return best;  
       }  
   
       best.UnionWith(candidates.Where(def => def.Consumes.IsExplicitIdentifier && def.Direction == preferredDirection));  
       if (best.Any()) {  
         return best;  
       }  
   
       best.UnionWith(candidates.Where(def => def.Consumes.IsExplicitIdentifier));  
       if (best.Any()) {  
         return best;  
       }  
   
       var workset = candidates.Where(def => def.Direction == preferredDirection).ToList();  
       if (workset.Count == 0) {  
         workset = candidates;  
       }  
   
       if (workset.Count == 1) {  
         return new HashSet<MethodDefinition>(workset);  
       } else if (workset.Count > 1) {  
         var superBest = workset.Where(candidate => workset.TrueForAll(other => candidate == other || candidate.IsMoreSpecificThan(other)));  
         if (superBest.Any()) {  
           return new HashSet<MethodDefinition>(superBest);  
         } else {  
           return new HashSet<MethodDefinition>(workset);  
         }  
       }  
   
       return new HashSet<MethodDefinition>(candidates);  
     }  
   }  
   
   public static class IEnumerableSignalExt {  
     public static IEnumerable<T> Signal<T>(this IEnumerable<T> collection, Action<T> signal){  
       foreach (var entry in collection) {  
         signal(entry);  
         yield return entry;  
       }  
     }  
   }  
 }  

The yielding allows me to try the best paths first. If we find a solution, we can exist out early. Unfortunately for maintainers (me), there's like 3 different places that can happen. And then we usually end up about 10 steps deep in recursion within this construct.

And the saddest thing of all is that this is actually easier to debug than what was there. It's harder to maintain, and certainly harder to understand. But performance is exponentially better even though this nests loops like it's going out of style. And it will support features in the future like partial consumption of a sub-expression based on user-defined methods.

Monday, September 5, 2011

Long weekend - Part 2: Dynamic Dispatch

Alas, Sunday was a wasted day due to a bad (legit) windows 7 ISO burn during my computer upgrades. I got to reinstall the OS just so I could re-download and burn the iso (this time checking md5's).

But now I have a beefy new I7 sitting under my desk. Woot.

Today I went through and did some more compiler and code-gen bugfixes so that I can tell you things without being a damned dirty liar. Tangent now properly supports dynamic dispatch for simple enums. Adding to our last example:

Type foo => enum{ values{ a, b+c } };
stuff (x: foo) => void { debug "in stuff."; }
stuff (x: foo.a.Type) => void { debug "in a stuff."; } 
entrypoint => void {
  debug " + "\"entrypoint\"" + @";
  stuff foo.b+c;
  stuff foo.a;
}


Here we overload stuff so that it can take any foo, but if it gets foo.a then that more specific implementation should be used. So you get the nice output:

entrypoint
in stuff.
in a stuff.


Yes yes, this isn't actually dynamic dispatch; just overloading methods on parameters. Once I can actually store variables, it'll be dynamic dispatch (another post for another time). C# supports it if the method takes a dynamic, but that isn't statically checked, few people know about it, and it's not super important in C# (plus dynamic methods are limited in where they can be used). In Tangent though using type context for terms that are ambiguous in other languages (but clear to humans) is the core of what it's trying to experiment with. This sort of method refinement on types/capabilities is a big feature that is likely to be a large part of the language idioms as it grows.

The other thing shown here is the Type member. This is autogenerated onto every type at the moment. For constant/sealed things like enums, it is typed as simply another reference to the constant (anonymous) type that enum values have. For classes, it is typed to return a kind of T. This makes sense since we're guaranteed that any subtypes will fit into that kind. It allows us to get static typing when working with types of variables for cloning, serialization, and factory sort of things.

2 days left in the long weekend. I hope to get another little bit of work and a post done, but I suspect work and (moreso) Disgaea 4 will derail that plan.

Saturday, September 3, 2011

Long weekend - Part 1: A Tiny Taste of Evil

Alas long hours of work, piles of social obligations and a superhuman aptitude for laziness have kept me from working much on the language; let alone posting to a blog that gets a whole half-dozen hits per week. But I have a 5 day weekend, and while work will still draw me in for some of the time (and laziness most of the time) I do hope to get some Tangent work done. Hell, some of that might even be feature work instead of bug fixing.

Today for example I got the last of the compiler bugs dealt with such that I can compile and run a simple Tangent program:

Type foo => enum{ values{ a } };

stuff( x:foo ) => void{
  debug "in stuff.";
}

entrypoint => void{
  debug "in entrypoint.";
  stuff foo.a;
}


Here debug is a hacked in keyword to Console.WriteLine string constants. Hurrah laziness. This runs nicely and prints the two lines as you would expect. One thing that is nifty/evil about this incarnation of Tangent is the ability to use phrases in more places than method declarations. So even though we can only do basic things at the moment, all of the syntax inference exists to do evil things like:

Type foo bar => enum{ values{ a, a+b } };

stuff( x:foo bar ) => void{
  debug "in stuff.";
}

entrypoint => void{
  debug "in entrypoint.";
  stuff  foo bar.a + b;
}


This too prints the same two lines even in this elemental stage. Note that the added spaces in stuff foo bar.a + b; are not really necessary for the compiler. The inference of operations is done on Type, not spacing. You'll still need spaces to distinguish foo bar and foobar (2 identifiers vs 1) but symbols do not need spacing to distinguish them.

In this example, it's just mostly evil; dot as member access doesn't work fantastic for the literate syntax. The idea though is that it will provide better, cleaner code to read when programmers create phrases to describe their problem domain.

Hopefully the extra-long weekend will allow more work to show more stuff being more cool.