#!/usr/bin/perl # Open this file in a web browser for documentation. #
$|++; my %data; my %transient; my %externalized_functions; my %datatypes; my %locations; # Maps eval-numbers to attribute names sub meta::define_form { my ($namespace, $delegate) = @_; $datatypes{$namespace} = $delegate; *{"meta::${namespace}::implementation"} = $delegate; *{"meta::$namespace"} = sub { my ($name, $value, %options) = @_; chomp $value; $data{"${namespace}::$name"} = $value unless $options{no_binding}; $delegate->($name, $value) unless $options{no_delegate}}} sub meta::eval_in { my ($what, $where) = @_; # Obtain next eval-number and alias it to the designated location @locations{eval('__FILE__') =~ /\(eval (\d+)\)/} = ($where); my $result = eval $what; $@ =~ s/\(eval \d+\)/$where/ if $@; warn $@ if $@; $result} meta::define_form 'meta', sub { my ($name, $value) = @_; meta::eval_in($value, "meta::$name")}; meta::meta('configure', <<'__'); # A function to configure transients. Transients can be used to store any number of # different things, but one of the more common usages is type descriptors. sub meta::configure { my ($datatype, %options) = @_; $transient{$_}{$datatype} = $options{$_} for keys %options; } __ meta::meta('externalize', <<'__'); # Function externalization. Data types should call this method when defining a function # that has an external interface. sub meta::externalize { my ($name, $attribute, $implementation) = @_; $externalized_functions{$name} = $attribute; *{"::$name"} = $implementation || $attribute; } __ meta::meta('functor::editable', <<'__'); # An editable type. This creates a type whose default action is to open an editor # on whichever value is mentioned. This can be changed using different flags. sub meta::functor::editable { my ($typename, %options) = @_; meta::configure $typename, %options; meta::define_form $typename, sub { my ($name, $value) = @_; $options{on_bind} && &{$options{on_bind}}($name, $value); meta::externalize $options{prefix} . $name, "${typename}::$name", sub { my $attribute = "${typename}::$name"; my ($command, @new_value) = @_; return &{$options{default}}(retrieve($attribute)) if ref $options{default} eq 'CODE' and not defined $command; return edit($attribute) if $command eq 'edit' or $options{default} eq 'edit' and not defined $command; return associate($attribute, @new_value ? join(' ', @new_value) : join('',Caterwaul is a pure Javascript compiler that lets you change the semantics of functions. To do this it implements a modular decompiler, macroexpander, and compiler that allow you to manipulate code in a first-class way. It also comes with several macro (Lisp-style, not C-style) libraries to make Javascript more fun (though you can easily disable them and/or write your own).
A shell is available to interactively use Caterwaul while reading the tutorial below.
Caterwaul's core macro set starts by extending Javascript syntax in some helpful ways. In particular, it enables quick function assignment and Ruby-style string interpolation:
f(x) = x + 1
String.prototype.say_hi() = 'hi from #{this}!'
Caterwaul translates these expressions into this:
f = function (x) { return x + 1; }; String.prototype.say_hi = function (name) { return 'hi ' + (name) + '!'; };
String interpolation and function assignment are the only irregular syntactic forms provided by Caterwaul. Everything else is implemented as a regular form called a modifier.
A modifier is a word that is used with an operator to modify a bit of syntax. For example, Caterwaul provides a modifier called when
to execute things conditionally:
log('hi') -when['foo'.length === 3]
There are two parts to a modifier. The first is the operator you use with it (in this case minus), and the second is the modifier and any arguments it takes. The operator is very important; it determines how much stuff you're modifying. For example:
log('hi'), log('again') -when[1 === 2]
Here the when[1 === 2]
only modifies log('again')
because minus has much higher precedence than the comma operator. However, Caterwaul lets you use several other
operators to change this:
log('hi'), log('again'), when[1 === 2]
In this case the when[1 === 2]
modifies both log
statements. The reason for this is kind of subtle: comma left-associates, so the first comma was collapsed into a
single syntax node that then became the left-hand side of the second comma. Because Caterwaul operates on the structure of your code, it groups both log
statements into the
conditional.
There are about six different operators you can use in conjunction with a modifier. From highest to lowest precedence they are:
log('hi') /when [true]
. I use this when I need something tighter than a minus.log('hi') -when [true]
. It also comes in another form: log('hi') -when- true
. I use this most of the time because it seems easier to
read.in
operator. For example, bind [x = 10] in x + 1
. in
has the same precedence as <
and >
, which is lower
than the arithmetic operators. As a result, it's useful when you're binding variables or creating functions around simple expressions.<>
operators. These are used around a modifier: log('hi') <unless> no_logging
. This has the same precedence as in
and other relational
operators.
|
operator. This is the lowest-precedence regular operator; the only things lower are &&
, ||
, ?:
, assignment, and the
comma.,
operator. This is the lowest-precedence operator in Javascript. It can be dangerous to use because it left-associates; for example,
f(x, y, z, where [z = 10])
will invoke f
on just one parameter, since the where
gobbles everything to its left. (Using a |
here
would fix the problem.)[]
operator. This starts the precedence hierarchy over by using explicit grouping. For example, bind[x = 10][log(x)]
. when
is one of the five conditional modifiers you can use. The others are unless
, otherwise
, when_defined
, and
unless_defined
. The semantics and return values are:
x -when- y -> y && x x -unless- y -> !y && x x -otherwise- y -> x || y x -when_defined- y -> y != null && x x -unless_defined- y -> y == null && x
These let you define locally-scoped variables. There is in fact only one such modifier, where
, but it can also be called bind
to read more naturally:
x -where [x = 10]
bind [x = 10] in x
bind [f(x) = x + 1] in f(7)
Despite the name, bind
has nothing to do with this
binding inside functions. (Though Caterwaul does provide some modifiers to handle that.) Previous versions of
Caterwaul called this macro let
, but it's a reserved word in recent versions of Javascript.
There are two words that create functions. One is given
, which creates a regular function. The other is bgiven
, which binds the function to the this
where it was defined. For example:
given[x] in x + 1
x + 1 -given[x]
f.call(10) -where [f = this -given- x]
f.call(10) -where [f = this -bgiven- x]
There's a shorthand you can use if you just have a single operand for a modifier:
x + 1 -given.x
given.x in x + 1
given.x [x + 1]
These make it easy to manipulate values and return things without using an explicit variable. We do this in English using pronouns, and Caterwaul binds the variable it
to refer
to "the thing that we're working with."
There are two ways to create a side-effect. One is to return the side-effecting expression and the other is to return the original value. For example, suppose you want to write a function
hash(k, v)
that returns a hash h
such that h[k] === v
. In plain Javascript you'd write this:
var hash = function (k, v) { var result = {}; result[k] = v; return result; };
However, the amount of typing required is much larger than the complexity of the problem. We want to return an object after applying a side-effect to it; to do this with Caterwaul we would
use the effect
modifier (also called se
, which stands for "side-effect"):
hash(k, v) = {} -effect [it[k] = v]
This style of side-effects returns the original expression. Sometimes, though, you want to return the result of the side-effect rather than the original. For example, here's a zero-division check in plain Javascript:
var x_over_yp1 = function (x, y) { var y_plus_1 = y + 1; return y_plus_1 === 0 ? 0 : x / y_plus_1; };
Here's the same function using a returning side-effect:
x_over_yp1(x, y) = y + 1 -returning [it === 0 ? 0 : x / it]
The returning
modifier is also called then
and re
. Note that it doesn't actually use return
(i.e. it won't jump out of stuff to return
from a function). It just returns a value locally as any other expression would. This means you can chain them along:
log('hi') -then- log('again') -then- log('!')
Side-effecting won't impact the evaluation order of your code. That is, x -effect- y
and x -returning- y
will always evaluate x
before y
.
These repeatedly execute an expression. There are four looping modifiers, each of which has only one name. over
is used with arrays; it forms a map. For example:
log(it) -over- [1, 2, 3]
This not only invokes log
on each element, but returns an array of the results returned by each invocation of log(it)
.
Similar to over
are over_keys
and over_values
, each of which operates on an object. Like over
, each one returns an array of results
(though unlike over
the array will not have any particular order). For example:
log(it) -over_keys- {foo: 'bar', bif: 'baz'}
log(it) -over_values- {foo: 1, bar: 2}
The final modifier is until
, which does exactly what it sounds like:
x = 0, log(x) -until [++x >= 10]
You can use these basic modifiers, but if you plan on doing any heavy lifting you should check out the sequence library below. (I rarely use anything else for iterative functions.)
Most people won't use this, but it's handy if you're doing heavy-duty syntax analysis or writing complex macros. The standard library includes an obscure modifier called qs
that
you can use to quote a piece of code. Quotation is basically grabbing the literal syntax rather than evaluating it normally. For example:
qs[foo + bar]
qs[foo + bar].data
qs[foo + bar].length
qs[foo + bar][0]
Quotation is an idea that comes from Lisp and is handled similarly by Caterwaul. (The only difference is that Caterwaul returns its own n-ary syntax tree format instead of cons trees.)
A variant, qse
, macroexpands the quoted code before returning it as a syntax tree. For example:
qse[log(foo) -unless[true]]
log(foo) -unless[true], qse
There are a few more modifiers that I threw in to the standard library to make some edge cases easier.
'oh no!' -wobbly
'another error!' -chuck
null.foo -failover- log('got #{e}')
safely [alert(e)] in undefined.bar
This is probably the gnarliest part of Caterwaul, but in my opinion it's also the most useful. The sequence library provides a modifier called seq
that reinterprets some syntax
within an APL-like domain-specific language. It generates very efficient code and lets you express maps, folds, cartesian products, zips, etc, with very little effort.
For instance, suppose we want an array of the first 10 squares. Using until
, the algorithm looks like this:
bind [i = 0] in i*i -until [++i > 10]
Using the sequence library looks like this:
n[1, 11] *[x * x] /seq
The *
operator is responsible for mapping, iterating, and flat-mapping. It's fairly easy to use; you just "multiply" a sequence by a bracketed expression. *
will
create a variable called x
and evaluate your expression for each element in the sequence. It then collects these results and returns a new array. For example:
seq in [1, 2, 3] *['x = #{x}']
You don't have to use just arrays. You can use anything with a .length
and [0]
... [n - 1]
attributes. One of the most common non-array collections I
use is a jQuery selector (just be sure to wrap x
again so that you're not dealing with a plain DOM node):
seq in $('div') *[$(x).attr('class')]
Most operators have an alternative form that does something similar to the original. You specify this form by using a !
after the operator. The alternative form of
*
is used to iterate without collecting the results; doing this returns the original array. For example:
seq in [1, 2, 3] *![log(x)]
The third use of *
is flat-mapping, which is denoted by writing *~!
. For example:
seq in [1, 2, 3] *~![[x, x + 1]]
Like the original form, these alternative forms can be combined with any of the operator features below.
The sequence library uses operators to describe operations on arrays. Most of them are regular binary infix operators like +
and *
, though a few of them have names
(such as n[]
above).
Despite the wide array of operators supported, there is a high degree of regularity among them. Each operator that takes a block (like *
does) has several options that can be
set to change the way it interprets the block.
Normally the expression inside []
is interpreted as a regular Javascript expression. But sometimes you want to remain in sequence context so that you don't have to explicitly
modify the expression. To do that, you prefix the []
with a ~
:
seq in [[1], [2], [3]] *~[x *[x + 1]]
In the example above we lost access to the outer x
due to shadowing. To avoid this problem, the sequence language lets you rename any variable by prefixing the []
with a new variable name:
seq in [1, 2, 3] *y[y + 1]
You can use both of these options at the same time, yielding this:
seq in [[1], [2], [3]] *~y[y *[x + 1]]
Note that you can't say *y~[...]
, as this is invalid Javascript syntax (~
is always a unary operator).
The filtering family of operators is denoted by %
. For instance, here's a way to get multiples of three:
seq in [1, 2, 3] %[x % 3 === 0]
Negation is so high precedence that it's often difficult to work it into a form without adding parentheses. The alternative form of %
negates the predicate:
seq in [1, 2, 3] %![x % 3]
The other alternative form of %
is a simultaneous map/filter. The idea is to return the expression value when it's truthy and drop the element otherwise. For example, we can
get the squares of all negative elements this way:
seq in [1, -2, -3, 4] %~![x < 0 && x * x]
You can fold stuff through a binary expression by using the /
family of operators. /
has two forms: left fold (the default), and right fold (written as
/!
). For example, here is how you might sum a bunch of numbers:
seq in [1, 2, 3] /[x + x0]
Since +
is associative it doesn't matter which direction the fold goes. It becomes obvious, however, if we interpolate the values into a string:
seq in [1, 2, 3] /['[#{x}, #{x0}]']
seq in [1, 2, 3] /!['[#{x}, #{x0}]']
Notice that for folding we have a new variable x0
. There are actually a few variables you have access to depending on what you're doing. Inside any block you'll have
x
, xi
(the current index), and xl
(the length of the original sequence). x0
is available only when folding. Each of these changes
uniformly if you rename the variable; so for instance:
seq in [1, 2, 3] /bar[bar + bar0 + bari + barl]
The sequence library provides existential quantification on arrays. Each of these uses a block that acts as a predicate. So, for instance, to determine whether any element in an array is positive:
[-4, -5, 10, 2] |[x > 0] |seq
The |
operator returns the first truthy value generated by the expression (not just true or false), so you can use it to detect things too. This block causes the sequence
comprehension to return not only whether an element is positive, but if so the first such element will be returned:
[-4, -5, 10, 2] |[x > 0 && x] |seq
[-4, -5, 10, 2] |[x -when[x > 0]] |seq
We can also use this construct to return the index of the first matching element. Because an index of 0 is falsy, we'll have to add one (so 0 is the not-found value rather than -1):
[-4, -5, 10, 2] |[xi + 1 -when[x > 0]] |seq
There are three ways you can combine things. The most obvious is concatenation, written +
:
seq in [1, 2, 3] + [4, 5, 6]
Less obvious are zipping, written ^
, and the inner product, written -
. Because ^
has lower precedence than in
, we have to switch to a
lower-precedence modifier form for seq
. For example:
[1, 2, 3] ^ [4, 5, 6] |seq
The inner product takes every possible pairing of elements from the two sequences:
seq in [1, 2, 3] - [4, 5, 6]
Each of these operators has lower precedence than *
, /
, and %
(all of which have equal precedence), so they can be used without parentheses.
A really useful and important feature of the sequence library is that it works with objects very easily. It has four operators, /keys
, /values
, /pairs
,
and |object
, that can convert between objects and arrays.
You can pull an array of the keys or values of an object (not in any particular order of course) by using /keys
and /values
. For example:
window /keys -seq
jQuery /values -seq
More interesting is the /pairs
operator. This pulls out key-value pairs as two-element arrays:
{foo: 'bar', bif: 'baz'} /pairs -seq
Its inverse is the |object
operator, which turns an array of those pairs back into an object:
[['foo', 'bar'], ['bif', 'baz']] |object |seq
Note the differing precedences of /keys
etc. and |object
. This is intentional. The rationale is that you rarely manipulate objects as objects in sequence
comprehensions, since the sequence library has no useful operators for objects other than unpacking. Therefore, objects come from various other values and enter a sequence comprehension,
which may at the very end zip an intermediate result into a final object return value.
I may change this in the future as I use it more, but any changes will be backwards-compatible.
Within a sequence comprehension you have access to the n[]
operator, which generates arrays of evenly-spaced numbers. It has three uses. When invoked on one argument it returns
integers between 0, inclusive, and the number, exclusive. When invoked with two arguments the first becomes the inclusive lower bound and the second is the exclusive upper bound. Adding a
third argument changes the increment from its default value of 1. For example:
n[10] -seq
n[5, 8] -seq
n[0, 1, 0.25] -seq
Most compilers operate offline; that is, they generate standalone code with no references back to the compiler. However, there are some cases where you want to interact with code as it's running. Caterwaul's tracing extension is one way to do this.
The idea behind a trace is that you can observe when (1) an expression is about to be evaluated, and (2) the value it produced after evaluation. Caterwaul does this by inserting hook functions into your source; these functions ideally don't change any behavior (other than making your code a bit slower) and allow you to see what's happening. You get to determine what to do with the observed expressions and values.
Here's an example of defining a function and then tracing it (note that Caterwaul doesn't provide the trace()
function used here):
f(n) = n ? n * f(n - 1) : 1
f = trace(f)
f(5)
If you run these statements and scroll back a bit you'll see these huge gnarly expressions with variables like gensym_1_gnhnr4un_bwv17j
. This is called a gensym (the term comes
from Lisp parlance), and Caterwaul uses variables like this when it needs a unique name. In this case we're seeing gensyms because this is how Caterwaul names its trace functions.
Sometimes you want to do something besides listing the expression values. Maybe you want to profile stuff, for example. To do this, you need to construct your own tracer. You do this by
calling caterwaul.tracer()
, which takes two optional callbacks and returns a trace function. (The trace()
function used above is the result of
caterwaul.tracer()
.) The first callback, if it is defined, will be invoked on each syntax tree before that tree is evaluated. The second callback will be invoked on the syntax
tree and the value that it produced. Based on this information we can now construct a very simple profiler that counts the number of evaluations of each expression:
counts = {}, trees = {}
count(tree) = trees[tree.id()] = tree -effect [counts[tree.id()] = (counts[tree.id()] || 0) + 1]
profile = caterwaul.tracer(null, count)
Now let's profile something:
is_prime(n) = !(n[2, Math.sqrt(n) + 1] |[n % x === 0] |seq)
takes_a_bit() = n[10000] %[is_prime(x)] /seq
profile(takes_a_bit)()
At this point the profiling data is in counts
and trees
. counts
maps tree IDs to the number of times that tree was evaluated, and trees
maps tree IDs to the trees they represent. Let's stash the tree-count pair list into its own variable:
pairs = counts /pairs *[[trees[x[0]], x[1]]] /seq
This is a complete profile, but maybe we don't want that much information at once. Let's just look for trees that represent push()
invocations:
pairs %[qs[_x.push(_y)].match(x[0])] /seq
Here we're using some methods provided by syntax trees. We first quote a pattern (which is an instance of Caterwaul's syntax tree class), and then we call its match()
method on
another tree. match()
returns an object if x[0]
matches qs[_x.push(_y)]
and false
otherwise. For the purposes of matching, identifiers
that start with underscores can match against any expression. The object that match()
returns maps the names of these wildcards to the trees they matched against.
Here are some other queries you could perform:
pairs %[qs[is_prime(_x)].match(x[0])] /seq
pairs %[x[1] > 5000] /seq
Because Caterwaul is written in Javascript, it's very easy to add to your application:
<script src='/path/to/caterwaul.js'></script> <script src='/path/to/caterwaul-extension.js'></script> <script src='/path/to/application-code.js'></script>
The application code can then refer to caterwaul
to access the compiler.
This is a paragraph...
#This is another paragraph...
#int main () {return 0;}#
int main () {return 0} // Won't compile#
$_"}; my $quoted = sub {&$escape_all(); &$unindent(); s/^\|(\s?)/ \1/; s/^ //mg; push @markup, &$indent() . "
$_"}; my $paragraph = sub {&$escape_some(); push @markup, &$indent() . "
$_
"}; my $section = sub {my $h = $_[0] > 6 ? 6 : $_[0]; push @markup, &$indent($_[0] - 1) . "').append('hi')).add($(' |