Thursday, September 15, 2011

The dark depths of this beast.



First a warning for those with heart conditions, epilepsy, or are pregnant; this post contains code snippets that have maimed the hale and healthy. YOU'VE BEEN WARNED!

One of the things I noticed while doing the last bit of work is how often the core part of the compiler iterates over the same damned thing. It needs to take a sequence of tokens and figure out what order of execution is necessary so that they 'make sense'. It's the core idea of the language, and making it work well, reliably, and fast/scalably is going to be key to the thing being viable. And frankly, its the sort of thing that the experimental language is there to experiment with.

What I was doing was simply iterating through the collection of tokens, if I found one to reduce, I did the reduction and then via recursion worked on the now smaller collection. Eventually I'd try everything and it'd fail, or I'd find what I was looking for and exit out early.

That though starts at the beginning each time. And it doesn't take any sort of type info into account when reducing things to see if we're "closer" to our answer. And it provides poor support for partial consumption, like the implicit feature I discussed earlier (this was actually the downfall of the previous iteration).

So I spend the past few days thinking and tonight redoing that core bit of code. And I now have the most horrible C# I've ever seen:

 using System;  
 using System.Collections.Generic;  
 using System.Linq;  
 using System.Text;  
 using Tangent.Lang.Compilation.IntermediateForm;  
 using System.CodeDom.Compiler;  
 using System.Threading.Tasks;  
 using System.Collections.Concurrent;  
   
 namespace Tangent.Lang.Compilation {  
   public partial class Compiler {  
     private IEnumerable<List<List<Element>>> UnflattenElements(List<Element> flatElements, Type targetType, List<Type> scope, CompilerErrorCollection errors, ConsumeDirection consumeDirection) {  
   
       System.Diagnostics.Debug.WriteLine("[{0}] <- {1}", targetType, string.Join(":", flatElements));  
   
       if (targetType == Type.ExecutionToken) {  
         yield return new List<List<Element>>() { flatElements };  
         yield break;  
       }  
   
       if (flatElements.Count == 0) {  
         yield break;  
       }  
   
       Element test = null;  
       if (consumeDirection == ConsumeDirection.Right) {  
         test = flatElements.First();  
       } else {  
         test = flatElements.Last();  
       }  
   
       if (targetType.IsExplicitIdentifier) {  
         if (!string.IsNullOrWhiteSpace(test.Identifier) && targetType == LiteralTypeFor(test.Identifier)) {  
           List<Element> rework = new List<Element>(flatElements);  
           rework[consumeDirection == ConsumeDirection.Right ? 0 : flatElements.Count - 1] = new IdentifierLiteral(test.Identifier, LiteralTypeFor(test.Identifier));  
           yield return new List<List<Element>>() { rework };  
           yield break;  
         }  
       }  
   
       if (flatElements.Count == 1) {  
         if (flatElements[0].EffectiveType != null && flatElements[0].EffectiveType.IsSubTypeOf(targetType)) {  
           yield return new List<List<Element>>() { flatElements };  
           yield break;  
         }  
       }  
   
       if (test.EffectiveType != null && test.EffectiveType.IsSubTypeOf(targetType)) {  
         yield return new List<List<Element>>() { flatElements };  
       }  
   
       int start = consumeDirection == ConsumeDirection.Right ? 0 : flatElements.Count - 1;  
       int end = consumeDirection == ConsumeDirection.Right ? flatElements.Count : -1;  
       int step = consumeDirection == ConsumeDirection.Right ? 1 : -1;  
       for (int i = start; i != end; i += step) {  
         IEnumerable<List<MethodDefinition>> possibilityStack = null;  
         List<Element> currentResolutionState = null;  
         if (flatElements[i] is ReResolveElement) {  
           possibilityStack = ResolveIdentifier(flatElements[i].Identifier, scope, errors)  
             .Signal(elements => currentResolutionState=elements)  
             .Select(elements => elements.Select(element => element.EffectiveType.ReductionRules[0]).ToList());  
   
         } else if (flatElements[i].EffectiveType.ReductionRules.Any()) {  
           possibilityStack = new List<List<MethodDefinition>>() { new List<MethodDefinition>(flatElements[i].EffectiveType.ReductionRules) };  
         }  
   
         if (possibilityStack != null) {  
           foreach (var possibles in possibilityStack) {  
   
             while (possibles.Any()) {  
               HashSet<MethodDefinition> best = HighestPriorityMethods(possibles, consumeDirection == ConsumeDirection.Right ? ConsumeDirection.Left : ConsumeDirection.Right);  
               possibles.RemoveAll(def => best.Contains(def));  
               List<Element> paramElements;  
   
               if (best.First().Direction == ConsumeDirection.Right) {  
                 if (i == flatElements.Count - 1) {  
                   paramElements = new List<Element>();  
                 } else {  
                   paramElements = new List<Element>(flatElements.GetRange(i + 1, flatElements.Count - i - 1));  
                 }  
               } else {  
                 if (i == 0) {  
                   paramElements = new List<Element>();  
                 } else {  
                   paramElements = new List<Element>(flatElements.GetRange(0, i));  
                 }  
               }  
   
               Dictionary<MethodDefinition, IEnumerator<List<List<Element>>>> possibleParamPermutations = new Dictionary<MethodDefinition, IEnumerator<List<List<Element>>>>();  
               foreach (var entry in best) {  
                 possibleParamPermutations[entry] = UnflattenElements(paramElements, entry.Consumes, scope, errors, entry.Direction).GetEnumerator();  
               }  
   
               while (possibleParamPermutations.Any()) {  
                 bool permutationsLeft = false;  
                 do {  
                   permutationsLeft = false;  
                   List<List<Element>> passback = new List<List<Element>>();  
                   Dictionary<MethodDefinition, IEnumerator<List<List<Element>>>> possibleSolutions = new Dictionary<MethodDefinition, IEnumerator<List<List<Element>>>>();  
   
                   foreach (var entry in best.Where(def => possibleParamPermutations.ContainsKey(def))) {  
                       
                     if (possibleParamPermutations[entry].MoveNext()) {  
                       permutationsLeft = true;  
                       foreach (var paramPermutation in possibleParamPermutations[entry].Current) {  
                         possibleSolutions[entry] = UnflattenElements(Reduce(flatElements, i, paramPermutation, entry, currentResolutionState), targetType, scope, errors, consumeDirection).GetEnumerator();  
                         if (!possibleSolutions[entry].MoveNext()) {  
                           possibleSolutions.Remove(entry);  
                         }  
                       }  
                     } else {  
                       // no op; eventually permutationsLeft will remain false.  
                       possibleParamPermutations.Remove(entry);  
                     }  
                   }  
   
                   if (possibleSolutions.Any()) {  
                     List<MethodDefinition> solutionDefs = new List<MethodDefinition>(possibleSolutions.Keys);  
                     while (solutionDefs.Any()) {  
                       HashSet<MethodDefinition> bestSolutions = HighestPriorityMethods(solutionDefs, consumeDirection == ConsumeDirection.Right ? ConsumeDirection.Left : ConsumeDirection.Right);  
                       solutionDefs.RemoveAll(def => bestSolutions.Contains(def));  
   
                       List<MethodDefinition> cleanup = new List<MethodDefinition>();  
                       while (bestSolutions.Any()) {  
                           
                         foreach (var bestSol in bestSolutions) {  
                           System.Diagnostics.Debug.WriteLine(string.Format("({2}) {0} : [{1}]", bestSol, string.Join(" =||= ", possibleSolutions[bestSol].Current.Select(line => string.Join(":", line))), flatElements.Count));  
                           passback.AddRange(possibleSolutions[bestSol].Current);  
                           if (!possibleSolutions[bestSol].MoveNext()) {  
                             cleanup.Add(bestSol);    
                           }  
                         }  
   
                         bestSolutions.RemoveWhere(key=>cleanup.Contains(key));  
                         cleanup.Clear();  
                         yield return passback;  
                       }  
                     }  
                   }  
                 } while (permutationsLeft == true);  
               }  
             }  
           }  
         }  
       }  
     }  
   
     private List<Element> Reduce(List<Element> flatElements, int ix, List<Element> paramPermutation, MethodDefinition signature, List<Element> currentResolutionState) {  
       List<Element> result = new List<Element>();  
       Func<int, Element> getReductionElement = (i) => {  
         if (flatElements[i] is ReResolveElement) {  
           if (currentResolutionState == null) {  
             throw new ArgumentNullException();  
           }  
   
           return currentResolutionState.Where(e => e.EffectiveType.ReductionRules[0] == signature).First();  
         } else {  
           return flatElements[i];  
         }  
       };  
         
       if (signature.Consumes == Type.ExecutionToken) {  
   
         if (ix != 0) {  
           result.AddRange(flatElements.GetRange(0, ix));  
         }  
   
         result.Add(new ReductionElement(getReductionElement(ix), new Literal("", Type.ExecutionToken, null), signature));  
   
         if (ix + 1 != flatElements.Count) {  
           result.AddRange(flatElements.GetRange(ix + 1, flatElements.Count - ix - 1));  
         }  
   
         return result;  
       }  
   
       if (signature.Direction == ConsumeDirection.Right) {  
         if (ix != 0) {  
           result.AddRange(flatElements.GetRange(0, ix));  
         }  
   
         if (signature.Consumes.IsExplicitIdentifier) {  
           result.Add(new ReductionElement(getReductionElement(ix), new IdentifierLiteral(paramPermutation[0].Identifier, LiteralTypeFor(paramPermutation[0].Identifier)), signature));  
         } else {  
           result.Add(new ReductionElement(getReductionElement(ix), paramPermutation[0], signature));  
         }  
   
         if (paramPermutation.Count != 1) {  
           result.AddRange(paramPermutation.GetRange(1, paramPermutation.Count - 1));  
         }  
   
         return result;  
       }  
   
       if (signature.Direction == ConsumeDirection.Left) {  
         if (paramPermutation.Count != 1) {  
           result.AddRange(paramPermutation.GetRange(0, paramPermutation.Count - 1));  
         }  
   
         if (signature.Consumes.IsExplicitIdentifier) {  
           result.Add(new ReductionElement(getReductionElement(ix), new IdentifierLiteral(paramPermutation.Last().Identifier, LiteralTypeFor(paramPermutation.Last().Identifier)), signature));  
         } else {  
           result.Add(new ReductionElement(getReductionElement(ix), paramPermutation.Last(), signature));  
         }  
   
         if (ix != flatElements.Count - 1) {  
           result.AddRange(flatElements.GetRange(ix + 1, flatElements.Count - ix - 1));  
         }  
   
         return result;  
       }  
   
       throw new ApplicationException("Got to unreachable code somehow...");  
     }  
   
     private static HashSet<MethodDefinition> HighestPriorityMethods(List<MethodDefinition> candidates, ConsumeDirection preferredDirection) {  
       if (candidates.Count <= 1) {  
         return new HashSet<MethodDefinition>(candidates);  
       }  
   
       HashSet<MethodDefinition> best = new HashSet<MethodDefinition>();  
       best.UnionWith(candidates.Where(def => def.Consumes == Type.ExecutionToken));  
       if (best.Any()) {  
         return best;  
       }  
   
       best.UnionWith(candidates.Where(def => def.Consumes.IsExplicitIdentifier && def.Direction == preferredDirection));  
       if (best.Any()) {  
         return best;  
       }  
   
       best.UnionWith(candidates.Where(def => def.Consumes.IsExplicitIdentifier));  
       if (best.Any()) {  
         return best;  
       }  
   
       var workset = candidates.Where(def => def.Direction == preferredDirection).ToList();  
       if (workset.Count == 0) {  
         workset = candidates;  
       }  
   
       if (workset.Count == 1) {  
         return new HashSet<MethodDefinition>(workset);  
       } else if (workset.Count > 1) {  
         var superBest = workset.Where(candidate => workset.TrueForAll(other => candidate == other || candidate.IsMoreSpecificThan(other)));  
         if (superBest.Any()) {  
           return new HashSet<MethodDefinition>(superBest);  
         } else {  
           return new HashSet<MethodDefinition>(workset);  
         }  
       }  
   
       return new HashSet<MethodDefinition>(candidates);  
     }  
   }  
   
   public static class IEnumerableSignalExt {  
     public static IEnumerable<T> Signal<T>(this IEnumerable<T> collection, Action<T> signal){  
       foreach (var entry in collection) {  
         signal(entry);  
         yield return entry;  
       }  
     }  
   }  
 }  

The yielding allows me to try the best paths first. If we find a solution, we can exist out early. Unfortunately for maintainers (me), there's like 3 different places that can happen. And then we usually end up about 10 steps deep in recursion within this construct.

And the saddest thing of all is that this is actually easier to debug than what was there. It's harder to maintain, and certainly harder to understand. But performance is exponentially better even though this nests loops like it's going out of style. And it will support features in the future like partial consumption of a sub-expression based on user-defined methods.

Monday, September 5, 2011

Long weekend - Part 2: Dynamic Dispatch

Alas, Sunday was a wasted day due to a bad (legit) windows 7 ISO burn during my computer upgrades. I got to reinstall the OS just so I could re-download and burn the iso (this time checking md5's).

But now I have a beefy new I7 sitting under my desk. Woot.

Today I went through and did some more compiler and code-gen bugfixes so that I can tell you things without being a damned dirty liar. Tangent now properly supports dynamic dispatch for simple enums. Adding to our last example:

Type foo => enum{ values{ a, b+c } };
stuff (x: foo) => void { debug "in stuff."; }
stuff (x: foo.a.Type) => void { debug "in a stuff."; } 
entrypoint => void {
  debug " + "\"entrypoint\"" + @";
  stuff foo.b+c;
  stuff foo.a;
}


Here we overload stuff so that it can take any foo, but if it gets foo.a then that more specific implementation should be used. So you get the nice output:

entrypoint
in stuff.
in a stuff.


Yes yes, this isn't actually dynamic dispatch; just overloading methods on parameters. Once I can actually store variables, it'll be dynamic dispatch (another post for another time). C# supports it if the method takes a dynamic, but that isn't statically checked, few people know about it, and it's not super important in C# (plus dynamic methods are limited in where they can be used). In Tangent though using type context for terms that are ambiguous in other languages (but clear to humans) is the core of what it's trying to experiment with. This sort of method refinement on types/capabilities is a big feature that is likely to be a large part of the language idioms as it grows.

The other thing shown here is the Type member. This is autogenerated onto every type at the moment. For constant/sealed things like enums, it is typed as simply another reference to the constant (anonymous) type that enum values have. For classes, it is typed to return a kind of T. This makes sense since we're guaranteed that any subtypes will fit into that kind. It allows us to get static typing when working with types of variables for cloning, serialization, and factory sort of things.

2 days left in the long weekend. I hope to get another little bit of work and a post done, but I suspect work and (moreso) Disgaea 4 will derail that plan.

Saturday, September 3, 2011

Long weekend - Part 1: A Tiny Taste of Evil

Alas long hours of work, piles of social obligations and a superhuman aptitude for laziness have kept me from working much on the language; let alone posting to a blog that gets a whole half-dozen hits per week. But I have a 5 day weekend, and while work will still draw me in for some of the time (and laziness most of the time) I do hope to get some Tangent work done. Hell, some of that might even be feature work instead of bug fixing.

Today for example I got the last of the compiler bugs dealt with such that I can compile and run a simple Tangent program:

Type foo => enum{ values{ a } };

stuff( x:foo ) => void{
  debug "in stuff.";
}

entrypoint => void{
  debug "in entrypoint.";
  stuff foo.a;
}


Here debug is a hacked in keyword to Console.WriteLine string constants. Hurrah laziness. This runs nicely and prints the two lines as you would expect. One thing that is nifty/evil about this incarnation of Tangent is the ability to use phrases in more places than method declarations. So even though we can only do basic things at the moment, all of the syntax inference exists to do evil things like:

Type foo bar => enum{ values{ a, a+b } };

stuff( x:foo bar ) => void{
  debug "in stuff.";
}

entrypoint => void{
  debug "in entrypoint.";
  stuff  foo bar.a + b;
}


This too prints the same two lines even in this elemental stage. Note that the added spaces in stuff foo bar.a + b; are not really necessary for the compiler. The inference of operations is done on Type, not spacing. You'll still need spaces to distinguish foo bar and foobar (2 identifiers vs 1) but symbols do not need spacing to distinguish them.

In this example, it's just mostly evil; dot as member access doesn't work fantastic for the literate syntax. The idea though is that it will provide better, cleaner code to read when programmers create phrases to describe their problem domain.

Hopefully the extra-long weekend will allow more work to show more stuff being more cool.