Saturday, July 23, 2011

I suck (and other musings)

Yes, it has been a long time. Yes, I've been doing very little on the project. Yes, I decided to restart again. Yes, I suck.

The last iteration, I managed to get completely turned around by kinds (that is, variables that store types). There are actually 3 sorts of types in the current iteration of Tangent:

- Types: Your (mostly) run of the mill types. They represent some possible set of values. 1, 3.14, "foo", etc.

- Type Literals: These are internal to the compiler. When you declare a new type, it's a type literal. The compiler knows exactly what it is, and marks it such that you can do operations on that (at compile time) and get a known answer out. So when you resolve the identifier 'string' you're accessing a variable (treated like any other static constant under the hood) for the type. This is what the last iteration neglected to understand.

- Kinds: These are type variables. Think C# generics. Tangent's Kinds require a type constraint (even if that constraint is 'anything'), but at that point they're a variable like any other. You can save them, modify them, use them as method parameters. The type checker uses the type constraint (just like C# does) to enforce that you only call methods on the type that the constraint has.

Now that I'm dealing with that properly, I have the compiler kinda sorta rebuilt. The only good thing about this is that I've decided to try and learn from the previous half dozen iterations; so the compiler actually compiles into an exe. Every other iteration either didn't get to runable code, or interpreted the intermediary form right in C#. The interpetedness made the code dog slow (and I had to implement all the interpreting making development slow) which was the death of that iteration.

Granted, the crime against humanity that I'm actually outputting is currently unoptimized and likely just as dog slow. For example:

  stuff foo.a; 

where stuff is a simple empty method call and foo.a represents an enum gets compiled out to...


return TangentObject.Global.Invoke(__tangentTypeCollection.__type0.ReductionRules[2], __tangentTypeCollection.__literal_'stuff').Invoke(__tangentTypeCollection.__type9.ReductionRules[0], TangentObject.Global.Invoke(__tangentTypeCollection.__type0.ReductionRules[6], __tangentTypeCollection.__literal_'.').Invoke(__tangentTypeCollection.__type25.ReductionRules[0], TangentObject.Global.Invoke(__tangentTypeCollection.__type0.ReductionRules[1], __tangentTypeCollection.__literal_'foo').Invoke(__tangentTypeCollection.__type6.ReductionRules[0], __tangentTypeCollection.__literal_)).Invoke(__tangentTypeCollection.__type26.ReductionRules[0], __tangentTypeCollection.__literal_'a').Invoke(__tangentTypeCollection.__type28.ReductionRules[0], __tangentTypeCollection.__literal_)).Invoke(__tangentTypeCollection.__type11.ReductionRules[0], __tangentTypeCollection.__literal_);


BLEARGH!

But it works, and it's a nice happy exe; and realistically optimization will (eventually) reduce that entirely to one direct method call.

Two other things of note since the last post involving syntax changes. I've reversed the order of identifier/types in variable declarations. It's now:


[variable]:[type]


Like pretty much every other language that includes a colon in the variable declaration. I imagine once I use it (or F#) the order will be less of an issue.

The other change involves type declarations. The old syntax used to be pretty much C# syntax:


[modifiers] class [: inheritance-list] (; | {[class-declaration-elements]*})


Since I expect anonymous types and type aliases to be more common in the language (and because I want to save myself work) it's been changed to the slightly more simple:


Goose? Type [method-declaration-element]+ => [type-expr];


So the actual declaration will look very similar to method declaration syntax. All of the rules for method declaration apply here. You must have 1 identifier or symbol, and then any number of other identifiers, symbols, or parameters. This I think provides a much nicer way to provide constructors to the language without running into multiple inheritance issues; the constructor is for the type, not the instance. If two constructors with the same name happen to lead to compatible types all the better:


Type foo( initial x : int ) => class {
  x : int = initial x;
};

Type foo => foo(42);

The simple class{} syntax is the anonymous type syntax, which can then be re-used within expressions. Inheritance will be done after the => like any other type operations. And since the language provides kinding support with user-defined operators, I expect those to be kinda awesome (and evil).

Saturday, January 22, 2011

New things

Sadly Tangent development often falls behind bills and food. Now that some time has opened up, we'll talk about actually doing some implementation. This blog was designed to step through those processes a little bit. Provide a mechanism to think about the feature, as well as provide some insight into implementation.

The feature that is the focus today is tentatively called Implicit Lambdas. C# 3 added a variety of extension methods to support operations over arbitrary collections. By default, they could look a bit... chunky:


    list.Where(x=>x.Color == Color.Blue).Select(x=>x.Position);

So the language designers introduced specialized syntax to deal with it:

    from x in list where x.Color == Color.Blue select x.Position;

This certainly looks a little better. Unfortunately, that's the only place that has special syntax. Any other methods that take another method as an argument need to use the full lambda syntax. Tangent has two problems with this. The first is that the parsing mechanism does not particularly lend itself to specialized keywords. The second is that Tangent's lambda syntax requires you specify the return type (so that the order of operation inference can work):


    (type:var,...)=>return-type{exprs}

Which means that we can't just ignore some sort of cleaner syntax to handle cases where lambdas make sense.


Enter Implicit Lambdas. The idea here is that since Tangent knows what method is taking the other method as an argument, we can let it specify what types it wants. Further, since almost every C# lambda ends up with something repetitive like:


    x => x.Color == Color.Blue

We can add some implicit scoping (and variable naming) to the mix. No need to specify x. It essentially (and under the covers) acts as though a new member method is being made on the fly for the type we're making a lambda for. So if we were to define our own where function in Tangent, it'd probably go like this:

    generic(any:T) (IEnumerable<T>: collection) where 
                   (implicit T -> bool: predicate) => yields<T> {
        foreach T:entry in collection {
            if predicate(entry) {
                yield entry;
            }
        }
    }


The modifier implicit here tells the compiler that we can use the automatic scope for that parameter. It's not automatic, because it doesn't make sense for everything. Sometimes it's useful to have the callsite explicitly show that there's a lambda there. So in Tangent, the linq query can be reduced to:


    list where Color == Blue select Position;

(assuming the Tangent convention of having Blue be a global/overload for the color, and select has had a similar treatment).

This feature does lack a little bit where the LINQ syntax does not. It doesn't deal well with multiple variables really; you'll need to have the implicit type be a key, which will probably be awkward to deal with at the callsite. It doesn't deal well when the implicit type you're acting on is actually the value you want to work with. Something like 'double every odd number in the list' is really awkward as it's currently designed.

On the upside this provides an arbitrary, user-definable way to capture expressions without nasty lambda syntax everywhere.

Sunday, December 12, 2010

Shiny things.

Now that some of the smaller core pieces are out of the way, we'll move on to the core feature that Tangent provides. The one feature that has consistently generated positive feedback, which grows out of the concepts we've discussed earlier. But first, let's provide a little historical background about how this feature developed.

One of the key things we've talked about already is the desire to provide arbitrarily named infix functions. Operator overloading on symbols leads to weird re-definition of behavior, yet entities (and games in general) want a bit of those operations that can work between types yet not necessarily own either. Infix notation makes it a lot nicer. So we'd end up with something like:

(Ogre)smash(Knight);

Which is nice, readable and concise. Its definition was a little awkward, but doable:

public operator void smash(Actor subject, Actor target){ ... }

But the parens would get quickly out of hand once we start nesting these. Is it possible to just have:

Ogre smash Knight;

Since the order of operation inference doesn't really care about the parens, it's a simple matter to make them not required in the syntax.

But then the programmer would want to do something like this:

Ogre smash Knight with rock;

The only way to really make this work is to have the smash function return something that takes the type that some global 'with' happens to return something that takes a Weapon and do all the work to curry things together so all the parameters can be used at once.

That sucks.

So the thought went towards two elements to make that better. The first is kind of simple. 'with' in the above example would need to be some global with a special type (or something similar) to get picked up properly (and/or prevent another identifier with the same signature being valid). But I'm writing the compiler. Why not have something 'just work', or even generate those types myself?

No reason at all. Tangent thus provides the concept of explicit identifier parameters. A function can be specified to take the literal identifier 'with' then, which takes priority over variables in overload scenarios. For example (using current syntax):

public   foo('bar': bar) => void { ... }
public   foo(int: bar) => void { ... }

// later in code
local int: bar = 42;
foo bar;   // calls the 'bar' -> void version of foo!

But that still requires you to define the methods yourself and do all the currying. A whole lot of mechanical work, which of course promptly got pushed to computers. The big thing here is allowing definition of phrases:

public   (Actor: subject) smash (Actor: target) 
  with (Weapon: weapon) => void { ... }

So that the method definition looks like it would be called. The compiler does all of the work making curried sub-functions, and making 'with' here a explicit identifier parameter.

In the end, we have something that looks like natural language. More importantly, we have something that behaves more like natural language. Tangent has the concept of 'makes sense'. If a statement doesn't shake out to void, it gets tossed at compile time. And the compiler will shake a statement until it does end up void, properly using the terms in their correct context, and handling the meaning overloads that are inherent to natural language. But nobody will use it if it is a pain. Phrase definition should provide a simple, intuitive mechanism for programmers to say what they want.

Having things closer to natural language should make the language very adaptable with regards towards building domain specific language in Tangent. Allowing a smaller gap between the domain and the code should make errors in translating smaller, as well as the spin up time for new programmers. As long as in the drive towards natural language the design doesn't forget that it is still a programming language.

Assuming that the natural language assumptions are correct, the question then becomes if the order of operation inference becomes too hard for programmers to read, debug, or even write what they really want to do in. Unfortunately that is something that I think can only be determined by writing code in the thing.

Tuesday, November 30, 2010

Type System basics: Multiple Inheritance and you.

Sadly, I've not been writing as much as I should. More sadly, I've not been working on Tangent as much as I should.

Last time I went over some of the simple tidbits of the type system. Now I'd like to go over some of the... more controversial features; almost all surround Tangent's ability to do multiple inheritance. But first, let's quickly calm those of you out there questioning my sanity with torches and pitchforks.

The original motivation for the language was component based entities in games. One of the great problems there is how to get the components together into stuff that does more than its parts. And how to get those parts working together. Modern languages make it difficult to do simple type composition that re-exposes the behaviors while sharing certain logical concepts. Multiple inheritance is always tried, but quickly disposed with because... well it sucks. It seemed as if multiple inheritance would be a fine solution if it didn't suck, so Tangent aims for that.

One of the most common threads about component based entities is how to share/reuse some components that seem dreadfully common. Position for example. If you have a component that handles movement, it needs to adjust some position. If you have a physics handler too, then it needs that Position; as well as a renderer, AI... Often times the position just gets dumped into the entity itself. Not a huge deal for Position, but rather self defeating on less ubiquitous traits.

Tangent addresses this problem by providing property access to fields, and supporting some syntactic sugar:


public class Moveable{
    public abstract Point  Position;
    public Move(Direction to) => void{ ... }
}

Here, Position looks like an abstract field (and in earlier models, actually was). It is actually sugar to require a read/write property. Since fields expose the property, it allows the programmer to implement the requirement however they like. Better yet it makes things consistent within the language itself.

Moveable then has its dependent component encoded into the type system. As long as it is aggregated with something that implements the abstract property, it will work happily.

Which leads us to some of the common multiple inheritance problems. Tangent has two cases of the Diamond Problem. The first:

public goose class A{}
public goose class B: A{}
public goose class C: A{}
public goose class D: C with B{}

This isn't really multiple inheritance. The 'with' operation creates an anonymous type where C inherits B. The dispatch then proceeds like single inheritance would. To use B's method in certain cases requires you explicitly override it in D and then invoke that version (syntax TBD).

The second:

public class A{}
public class B{} // implements A
public class C{} // implements A
public class D{} // implements B and C

public  foo(A:arg)=>void{}
public  foo(B:arg)=>void{}
public  foo(C:arg)=>void{}

//somewhere in code
local A: d = new D;
foo(d);


Since Tangent supports dynamic dispatch on parameters, the runtime type of d is used to determine what overload to use. Sadly, there's no great solution here. If the local was declared as D you would get a compile time error. With this code, you will get an exception. Static analysis can identify methods that aren't 'closed' as far as the type system is concerned. It'll likely be a compiler flag.

The other common issue is what order to run constructors/destructors/etc. Here is where Tangent takes a little deviation. There are no constructors. Tangent allows only field specific initializers. The inheritor (or the left side of the 'with' operation) wins if there's a collision on non-private fields.

Combined with some of the other language features to accommodate the 'I need some value to initialize the invariant!' use of constructors, this should provide a good mechanism for type compositioning without many of the headaches found in other implementations (but surely a few of its own).

Here's hoping for more frequent work/posts!

Saturday, November 6, 2010

Get your ducks in a row: Type System basics

When starting out thinking through what Tangent needed to be, I started with the type system. When I went to implement the thing and ran into issues, they got worked out by starting with the type system.

This isn't exactly a huge surprise. The type system is what prevents a lot of good options when doing component based architecture in gamedev (the original language motivator). There's a not insignificant push to dynamic languages because of problems with static typing. In my experience though, dynamic languages tend to fall down in practice with larger systems and/or more programmers.

That isn't to say there aren't improvements to be had. That is what Tangent aims for: a flexible version of static typing that gets in your way less.

The most straightforward change is to use a Structural Type System. Simply put, if you have the C# types:


public interface SomethingWithName{
    string Name{get;}
}

public class Pirate{
   public string Name{get; set;}
}


In a structural type system, Pirate is an acceptable subtype of SomethingWithName despite not explicitly inheriting from it. This is pretty similar to duck typing found in dynamic languages, but is statically checked. It allows more generic code, since methods can focus on the parts they need, however the object supplies them. The writer doesn't need to know what that is, and the consumer of the method doesn't need to inherit the right interface just to use something that would work fine.

Unless it doesn't. There is certainly the case where a type might meet the criteria for an interface, but not implement the 'spirit' of the interface. You'll then get some happy runtime errors when the generic method does the entirely wrong thing. To avoid some of that (and to help inter-op with other languages) Tangent also provides a mechanism to disable that behavior. In version 1, the keyword goose (read: not duck) was used to tag a type declaration. Types with the goose tag require that a subtype explicitly inherit from it to be considered a sub-type; just like C# and its nominatively typed kin. Unless I hear a better option, goose is likely to remain the keyword.


For next time, we'll go over some of the other features of the type system, as well as our first unpopular compromise.

Friday, November 5, 2010

Introductions

Since you're here, I presume you're curious about what Tangent is, and if you should spend your very valuable web browsing time to care. The original motivation for the language was the difficulty in making component based designs for game development. That led to a few experiments, which led to wanting to try out some features, which led to Version 1. Version 1 involved quite a bit of fumbling around, bad ideas, and the prototyping you'd expect from a version 1. This journal will focus on the design and development of version 2.


Tangent is designed to be a general purpose programming language. It will end up being 'higher level' than C# and Java. It is statically typed. Beyond that, it is fairly peculiar and hard to categorize. The core concept that many of the features build off of is Order of Operation Inference.

Type Inference is a well known feature of programming languages. You have a set of known operations on known types in a known order and the compiler figures out what the resultant type of those operations is. Order of Operation Inference turns that on its head. Tangent forces each statement to result in void. It also compiles all of the Type information before compiling the Executable information. Now with a known resultant type, known operations on known types the compiler figures out what order of operations on those operations 'works' (with a few constraints/preferences to cut down on ambiguity/the search pool).

The original motivation for this was originally to allow user defined free functions that could act as infix operators and have arbitrary names. Unfortunately, that meant that order of operations on arbitrary operators is pretty much unusable. You either just read the things from left to right (which sucks when infix operators were the main goal and leads to Lisp-like paren overload usually anyways) or you have the programming specify some priority (which never works, since the priority has to line up correctly with random priorities set by other programmers). Inferring the order of operations provides a mechanism to solve that problem without forcing the programmer to do more than they would normally do. Further, it provides interesting behavior that can be utilized for other features.


Next time, we'll go into some of the type system basics, as well as some of the practical implications of Order of Operation Inference.