Sunday, May 31, 2015

Algebraic Data Types

So about a month ago, I posted about the addition of product types to Tangent. These allow you to combine data together into things that look like classes. Well today is the other side of that coin, sum types. These let your data be either one type or another. Together, product types and sum types provide what is known as Algebraic Data Types, which functional languages have long touted for their expressiveness. Types can then be defined like "Foo is a string and a datetime or a string and a timespan".

The example test code for today:

int list :> int | (a: int),(b: int list) {
  (this).head => int { a }
  (this).tail => int list { b }
  print (this) => void {
    print this.head;
    print this.tail;

print (il: int list) => void {}

entrypoint => void {
  print 1,2,3;

This defines a nice simple (non-empty) list of integers and functions to print the list. The pipe ('|') is used as the type constructor for sum types in type declarations. So the declaration above is read as "int list is either an int or an int and an int list". The recursive type then cascades out indefinitely. You'll find this sort of thing all over Lisp and other functional languages. Here, I've defined the comma as the list constructor. And because of how Tangent works, list constructs are directly usable in code (as in the entrypoint). No boatloads of parens necessary like in Lisp. No need to extend the syntax like C# and other languages with special list initialization syntax. You can do it right in-language with Tangent.

Alas, there's still some cruft around things. First, is the need for the empty print function for int lists. Right now there's no real code for working with sum types. The only thing that is there is specialization logic like there is for single value types. So the "base" function is needed for the code to compile against, but the compiler is smart enough to dispatch calls to more specific versions of the function (int and the anonymous product type in this case). Not great.

Also not great is the passthroughs for head and tail on the anonymous product type. I'd like to have some syntax that would automatically re-expose constructor parameters, but I'm not sure what it is, or what it would look like.

And lastly, this is a pretty specific implementation. Lists should be generic. I have a good idea about what the syntax will look for that, and I expect that stuff to be the next large chunk of code worked on.

Why are these things here you ask? Sure, this sort of code seems pretty academic. After all, I don't expect people to use these lists instead of .NET's List. The main motivation for these things is so I don't have null in the language. The pains of null are well documented, and I'm curious how thing would work in a world where you need explicit nullable types. Beyond that, I hope that with sum types I don't need to implement exceptions. If your function can return something or an error, use a sum type that says that explicitly. That's very like Java checked exceptions, which suck hard, but I'm curious if the language flexibility can't lessen that pain in practice.

Anyways, these are now available in source control.

One bug I ran into was the lack of type checking in CIL. It happily sent the wrong parameter into a function and the function happily ran it, leading to badness. Similarly, if you don't unbox value types, they too will happily run the function on the wrong bits of data. Something to remember in the future since it was troublesome to track down. I also imagine that I could use such behavior in hacky terrible ways should they be necessary.

Also, sum types were a pain to implement. Product types are fairly simple, since the "does the type of that match what we need" logic can work the same as it did with simple types. Sum types required fairly distributed changes to a lot of different code since they add not only a selection step to type matching, but a layer of hierarchy to the type structure (though that was maybe since I insisted on anonymous types like above). It was interesting to see, and may be why so few languages implement them.

No comments:

Post a Comment