Introduction

Caterwaul is a pure Javascript compiler that lets you change the semantics of functions. To do this it implements a modular decompiler, macroexpander, and compiler that allow you to manipulate code in a first-class way. It also comes with several macro (Lisp-style, not C-style) libraries to make Javascript more fun (though you can easily disable them and/or write your own).

A shell is available to interactively use Caterwaul while reading the tutorial below.

Javascript extensions

Caterwaul's core macro set starts by extending Javascript syntax in some helpful ways. In particular, it enables quick function assignment and Ruby-style string interpolation:

f(x) = x + 1

String.prototype.say_hi() = 'hi from #{this}!'

Caterwaul translates these expressions into this:

f = function (x) {
  return x + 1;
};
String.prototype.say_hi = function (name) {
  return 'hi ' + (name) + '!';
};

String interpolation and function assignment are the only irregular syntactic forms provided by Caterwaul. Everything else is implemented as a regular form called a modifier.

Modifiers

A modifier is a word that is used with an operator to modify a bit of syntax. For example, Caterwaul provides a modifier called when to execute things conditionally:

log('hi') -when['foo'.length === 3]

There are two parts to a modifier. The first is the operator you use with it (in this case minus), and the second is the modifier and any arguments it takes. The operator is very important; it determines how much stuff you're modifying. For example:

log('hi'), log('again') -when[1 === 2]

Here the when[1 === 2] only modifies log('again') because minus has much higher precedence than the comma operator. However, Caterwaul lets you use several other operators to change this:

log('hi'), log('again'), when[1 === 2]

In this case the when[1 === 2] modifies both log statements. The reason for this is kind of subtle: comma left-associates, so the first comma was collapsed into a single syntax node that then became the left-hand side of the second comma. Because Caterwaul operates on the structure of your code, it groups both log statements into the conditional.

Modifier operators

There are about six different operators you can use in conjunction with a modifier. From highest to lowest precedence they are:

The slash. For example, log('hi') /when [true]. I use this when I need something tighter than a minus.
The minus. For example, log('hi') -when [true]. It also comes in another form: log('hi') -when- true. I use this most of the time because it seems easier to read.
The in operator. For example, bind [x = 10] in x + 1. in has the same precedence as < and >, which is lower than the arithmetic operators. As a result, it's useful when you're binding variables or creating functions around simple expressions.
The <> operators. These are used around a modifier: log('hi') <unless> no_logging. This has the same precedence as in and other relational operators.
The | operator. This is the lowest-precedence regular operator; the only things lower are &&, ||, ?:, assignment, and the comma.
The , operator. This is the lowest-precedence operator in Javascript. It can be dangerous to use because it left-associates; for example, f(x, y, z, where [z = 10]) will invoke f on just one parameter, since the where gobbles everything to its left. (Using a | here would fix the problem.)
The [] operator. This starts the precedence hierarchy over by using explicit grouping. For example, bind[x = 10][log(x)].

Conditional modifiers

when is one of the five conditional modifiers you can use. The others are unless, otherwise, when_defined, and unless_defined. The semantics and return values are:

x -when- y           -> y && x
x -unless- y         -> !y && x
x -otherwise- y      -> x || y
x -when_defined- y   -> y != null && x
x -unless_defined- y -> y == null && x

Binding modifiers

These let you define locally-scoped variables. There is in fact only one such modifier, where, but it can also be called bind to read more naturally:

x -where [x = 10]

bind [x = 10] in x

bind [f(x) = x + 1] in f(7)

Despite the name, bind has nothing to do with this binding inside functions. (Though Caterwaul does provide some modifiers to handle that.) Previous versions of Caterwaul called this macro let, but it's a reserved word in recent versions of Javascript.

Function modifiers

There are two words that create functions. One is given, which creates a regular function. The other is bgiven, which binds the function to the this where it was defined. For example:

given[x] in x + 1

x + 1 -given[x]

f.call(10) -where [f = this -given- x]

f.call(10) -where [f = this -bgiven- x]

There's a shorthand you can use if you just have a single operand for a modifier:

x + 1 -given.x

given.x in x + 1

given.x [x + 1]

Side-effecting modifiers

These make it easy to manipulate values and return things without using an explicit variable. We do this in English using pronouns, and Caterwaul binds the variable it to refer to "the thing that we're working with."

There are two ways to create a side-effect. One is to return the side-effecting expression and the other is to return the original value. For example, suppose you want to write a function hash(k, v) that returns a hash h such that h[k] === v. In plain Javascript you'd write this:

var hash = function (k, v) {
  var result = {};
  result[k] = v;
  return result;
};

However, the amount of typing required is much larger than the complexity of the problem. We want to return an object after applying a side-effect to it; to do this with Caterwaul we would use the effect modifier (also called se, which stands for "side-effect"):

hash(k, v) = {} -effect [it[k] = v]

This style of side-effects returns the original expression. Sometimes, though, you want to return the result of the side-effect rather than the original. For example, here's a zero-division check in plain Javascript:

var x_over_yp1 = function (x, y) {
  var y_plus_1 = y + 1;
  return y_plus_1 === 0 ? 0 : x / y_plus_1;
};

Here's the same function using a returning side-effect:

x_over_yp1(x, y) = y + 1 -returning [it === 0 ? 0 : x / it]

The returning modifier is also called then and re. Note that it doesn't actually use return (i.e. it won't jump out of stuff to return from a function). It just returns a value locally as any other expression would. This means you can chain them along:

log('hi') -then- log('again') -then- log('!')

Side-effecting won't impact the evaluation order of your code. That is, x -effect- y and x -returning- y will always evaluate x before y.

Looping modifiers

These repeatedly execute an expression. There are four looping modifiers, each of which has only one name. over is used with arrays; it forms a map. For example:

log(it) -over- [1, 2, 3]

This not only invokes log on each element, but returns an array of the results returned by each invocation of log(it).

Similar to over are over_keys and over_values, each of which operates on an object. Like over, each one returns an array of results (though unlike over the array will not have any particular order). For example:

log(it) -over_keys- {foo: 'bar', bif: 'baz'}

log(it) -over_values- {foo: 1, bar: 2}

The final modifier is until, which does exactly what it sounds like:

x = 0, log(x) -until [++x >= 10]

You can use these basic modifiers, but if you plan on doing any heavy lifting you should check out the sequence library below. (I rarely use anything else for iterative functions.)

Quotation

Most people won't use this, but it's handy if you're doing heavy-duty syntax analysis or writing complex macros. The standard library includes an obscure modifier called qs that you can use to quote a piece of code. Quotation is basically grabbing the literal syntax rather than evaluating it normally. For example:

qs[foo + bar]

qs[foo + bar].data

qs[foo + bar].length

qs[foo + bar][0]

Quotation is an idea that comes from Lisp and is handled similarly by Caterwaul. (The only difference is that Caterwaul returns its own n-ary syntax tree format instead of cons trees.)

A variant, qse, macroexpands the quoted code before returning it as a syntax tree. For example:

qse[log(foo) -unless[true]]

log(foo) -unless[true], qse

Other modifiers

There are a few more modifiers that I threw in to the standard library to make some edge cases easier.

'oh no!' -wobbly

'another error!' -chuck

null.foo -failover- log('got #{e}')

safely [alert(e)] in undefined.bar

Sequence library

This is probably the gnarliest part of Caterwaul, but in my opinion it's also the most useful. The sequence library provides a modifier called seq that reinterprets some syntax within an APL-like domain-specific language. It generates very efficient code and lets you express maps, folds, cartesian products, zips, etc, with very little effort.

For instance, suppose we want an array of the first 10 squares. Using until, the algorithm looks like this:

bind [i = 0] in i*i -until [++i > 10]

Using the sequence library looks like this:

n[1, 11] *[x * x] /seq

Mapping and iterating

The * operator is responsible for mapping, iterating, and flat-mapping. It's fairly easy to use; you just "multiply" a sequence by a bracketed expression. * will create a variable called x and evaluate your expression for each element in the sequence. It then collects these results and returns a new array. For example:

seq in [1, 2, 3] *['x = #{x}']

You don't have to use just arrays. You can use anything with a .length and [0] ... [n - 1] attributes. One of the most common non-array collections I use is a jQuery selector (just be sure to wrap x again so that you're not dealing with a plain DOM node):

seq in $('div') *[$(x).attr('class')]

Alternative forms

Most operators have an alternative form that does something similar to the original. You specify this form by using a ! after the operator. The alternative form of * is used to iterate without collecting the results; doing this returns the original array. For example:

seq in [1, 2, 3] *![log(x)]

The third use of * is flat-mapping, which is denoted by writing *~!. For example:

seq in [1, 2, 3] *~![[x, x + 1]]

Like the original form, these alternative forms can be combined with any of the operator features below.

Operator features

The sequence library uses operators to describe operations on arrays. Most of them are regular binary infix operators like + and *, though a few of them have names (such as n[] above).

Despite the wide array of operators supported, there is a high degree of regularity among them. Each operator that takes a block (like * does) has several options that can be set to change the way it interprets the block.

Sequence interpretation

Normally the expression inside [] is interpreted as a regular Javascript expression. But sometimes you want to remain in sequence context so that you don't have to explicitly modify the expression. To do that, you prefix the [] with a ~:

seq in [[1], [2], [3]] *~[x *[x + 1]]

Variable renaming

In the example above we lost access to the outer x due to shadowing. To avoid this problem, the sequence language lets you rename any variable by prefixing the [] with a new variable name:

seq in [1, 2, 3] *y[y + 1]

You can use both of these options at the same time, yielding this:

seq in [[1], [2], [3]] *~y[y *[x + 1]]

Note that you can't say *y~[...], as this is invalid Javascript syntax (~ is always a unary operator).

Filtering

The filtering family of operators is denoted by %. For instance, here's a way to get multiples of three:

seq in [1, 2, 3] %[x % 3 === 0]

Alternative forms

Negation is so high precedence that it's often difficult to work it into a form without adding parentheses. The alternative form of % negates the predicate:

seq in [1, 2, 3] %![x % 3]

The other alternative form of % is a simultaneous map/filter. The idea is to return the expression value when it's truthy and drop the element otherwise. For example, we can get the squares of all negative elements this way:

seq in [1, -2, -3, 4] %~![x < 0 && x * x]

Folding

You can fold stuff through a binary expression by using the / family of operators. / has two forms: left fold (the default), and right fold (written as /!). For example, here is how you might sum a bunch of numbers:

seq in [1, 2, 3] /[x + x0]

Since + is associative it doesn't matter which direction the fold goes. It becomes obvious, however, if we interpolate the values into a string:

seq in [1, 2, 3] /['[#{x}, #{x0}]']

seq in [1, 2, 3] /!['[#{x}, #{x0}]']

Notice that for folding we have a new variable x0. There are actually a few variables you have access to depending on what you're doing. Inside any block you'll have x, xi (the current index), and xl (the length of the original sequence). x0 is available only when folding. Each of these changes uniformly if you rename the variable; so for instance:

seq in [1, 2, 3] /bar[bar + bar0 + bari + barl]

Quantification

The sequence library provides existential quantification on arrays. Each of these uses a block that acts as a predicate. So, for instance, to determine whether any element in an array is positive:

[-4, -5, 10, 2] |[x > 0] |seq

The | operator returns the first truthy value generated by the expression (not just true or false), so you can use it to detect things too. This block causes the sequence comprehension to return not only whether an element is positive, but if so the first such element will be returned:

[-4, -5, 10, 2] |[x > 0 && x] |seq

[-4, -5, 10, 2] |[x -when[x > 0]] |seq

We can also use this construct to return the index of the first matching element. Because an index of 0 is falsy, we'll have to add one (so 0 is the not-found value rather than -1):

[-4, -5, 10, 2] |[xi + 1 -when[x > 0]] |seq

Combination

There are three ways you can combine things. The most obvious is concatenation, written +:

seq in [1, 2, 3] + [4, 5, 6]

Less obvious are zipping, written ^, and the inner product, written -. Because ^ has lower precedence than in, we have to switch to a lower-precedence modifier form for seq. For example:

[1, 2, 3] ^ [4, 5, 6] |seq

The inner product takes every possible pairing of elements from the two sequences:

seq in [1, 2, 3] - [4, 5, 6]

Each of these operators has lower precedence than *, /, and % (all of which have equal precedence), so they can be used without parentheses.

Objects

A really useful and important feature of the sequence library is that it works with objects very easily. It has four operators, /keys, /values, /pairs, and |object, that can convert between objects and arrays.

You can pull an array of the keys or values of an object (not in any particular order of course) by using /keys and /values. For example:

window /keys -seq

jQuery /values -seq

More interesting is the /pairs operator. This pulls out key-value pairs as two-element arrays:

{foo: 'bar', bif: 'baz'} /pairs -seq

Its inverse is the |object operator, which turns an array of those pairs back into an object:

[['foo', 'bar'], ['bif', 'baz']] |object |seq

Note the differing precedences of /keys etc. and |object. This is intentional. The rationale is that you rarely manipulate objects as objects in sequence comprehensions, since the sequence library has no useful operators for objects other than unpacking. Therefore, objects come from various other values and enter a sequence comprehension, which may at the very end zip an intermediate result into a final object return value.

I may change this in the future as I use it more, but any changes will be backwards-compatible.

Numerical iteration

Within a sequence comprehension you have access to the n[] operator, which generates arrays of evenly-spaced numbers. It has three uses. When invoked on one argument it returns integers between 0, inclusive, and the number, exclusive. When invoked with two arguments the first becomes the inclusive lower bound and the second is the exclusive upper bound. Adding a third argument changes the increment from its default value of 1. For example:

n[10] -seq

n[5, 8] -seq

n[0, 1, 0.25] -seq

Development tools

Most compilers operate offline; that is, they generate standalone code with no references back to the compiler. However, there are some cases where you want to interact with code as it's running. Caterwaul's tracing extension is one way to do this.

The idea behind a trace is that you can observe when (1) an expression is about to be evaluated, and (2) the value it produced after evaluation. Caterwaul does this by inserting hook functions into your source; these functions ideally don't change any behavior (other than making your code a bit slower) and allow you to see what's happening. You get to determine what to do with the observed expressions and values.

Here's an example of defining a function and then tracing it (note that Caterwaul doesn't provide the trace() function used here):

f(n) = n ? n * f(n - 1) : 1

f = trace(f)

f(5)

If you run these statements and scroll back a bit you'll see these huge gnarly expressions with variables like gensym_1_gnhnr4un_bwv17j. This is called a gensym (the term comes from Lisp parlance), and Caterwaul uses variables like this when it needs a unique name. In this case we're seeing gensyms because this is how Caterwaul names its trace functions.

Building your own tracer

Sometimes you want to do something besides listing the expression values. Maybe you want to profile stuff, for example. To do this, you need to construct your own tracer. You do this by calling caterwaul.tracer(), which takes two optional callbacks and returns a trace function. (The trace() function used above is the result of caterwaul.tracer().) The first callback, if it is defined, will be invoked on each syntax tree before that tree is evaluated. The second callback will be invoked on the syntax tree and the value that it produced. Based on this information we can now construct a very simple profiler that counts the number of evaluations of each expression:

counts = {}, trees = {}

count(tree) = trees[tree.id()] = tree -effect [counts[tree.id()] = (counts[tree.id()] || 0) + 1]

profile = caterwaul.tracer(null, count)

Now let's profile something:

is_prime(n) = !(n[2, Math.sqrt(n) + 1] |[n % x === 0] |seq)

takes_a_bit() = n[10000] %[is_prime(x)] /seq

profile(takes_a_bit)()

At this point the profiling data is in counts and trees. counts maps tree IDs to the number of times that tree was evaluated, and trees maps tree IDs to the trees they represent. Let's stash the tree-count pair list into its own variable:

pairs = counts /pairs *[[trees[x[0]], x[1]]] /seq

This is a complete profile, but maybe we don't want that much information at once. Let's just look for trees that represent push() invocations:

pairs %[qs[_x.push(_y)].match(x[0])] /seq

Here we're using some methods provided by syntax trees. We first quote a pattern (which is an instance of Caterwaul's syntax tree class), and then we call its match() method on another tree. match() returns an object if x[0] matches qs[_x.push(_y)] and false otherwise. For the purposes of matching, identifiers that start with underscores can match against any expression. The object that match() returns maps the names of these wildcards to the trees they matched against.

Here are some other queries you could perform:

pairs %[qs[is_prime(_x)].match(x[0])] /seq

pairs %[x[1] > 5000] /seq

Using Caterwaul

Because Caterwaul is written in Javascript, it's very easy to add to your application:

<script src='/path/to/caterwaul.js'></script>
<script src='/path/to/caterwaul-extension.js'></script>
<script src='/path/to/application-code.js'></script>

The application code can then refer to caterwaul to access the compiler.

').addClass('foo') table(tr(td('hi')), tbody) /jquery -> $('').append($('').append($(''))) None of the macroexpansions here rely on opaque syntax refs, so they can all be precompiled. Also, the generated code refers to jQuery rather than $ -- this gives you more flexibility about setting noConflict(). If you need to set noConflict(true) (which removes the global jQuery variable), you can bind it locally to make the DOM stuff work: | div.foo /jquery -where [jQuery = stashed_copy_of_jquery] Notation. Caterwaul didn't previously have a DOM plugin in its core distribution. The html[] macro in previous versions of caterwaul came from montenegro, a web framework I was developing in tandem with caterwaul. However, it's useful to have DOM functionality so I'm including it in the main caterwaul distribution. Most of the syntax is copied from the html[] macro in montenegro: | element.class -> $('').addClass('class') element *foo('bar') -> $('').attr('foo', 'bar') element *!foo('bar') -> $('').data('foo', 'bar') <- new! element /foo('bar') -> $('').foo('bar') element /!foo(bar) -> $('').bind('foo', bar) <- new! +element -> element <- new! element %foo -> foo($('')) element(child) -> $('').append(child /jquery) <- here the /jquery marker indicates that 'child' will be re-expanded element(child1, child2) -> $('').append((child1 /jquery).add((child2 /jquery))) element[child] -> $('').append(child) <- no re-expansion here element[child1, child2] -> $('').append(child1.add(child2)) element > child -> $('').append(child /jquery) element >= child -> $('').append(child) element1, element2 -> (element1 /jquery).add((element2 /jquery)) There's also some new syntax to make certain things easier. In particular, I didn't like the way nesting worked in previous versions, so this driver supports some new operators to make it more intuitive: | element1 + element2 -> (element1 /jquery).add((element2 /jquery)) The result of this operator is that you have options as far as nesting is concerned: | div.foo > span.first + span.second, ->

div.bar > span.third + span.fourth

Also, you can now dig through the DOM using HTML selectors. Here's what that looks like: | element >> div.foo -> element.filter('div.foo') element >> _.foo -> element.filter('*.foo') element >>> div.foo -> element.find('div.foo') element << div.foo -> element.parents('div.foo') element >> div.foo /first -> element.filter('div.foo:first') element >> div.foo /contains(x) -> element.filter('div.foo:contains("#{x}")') element >> div.foo + div.bar -> element.filter('div.foo, div.bar') element >> (span >> p) -> element.filter('span p') element >> (span >>> p) -> element.filter('span p') element >> (span > p) -> element.filter('span > p') element >> span[foo] -> element.filter('span[foo]') element >> span[data_bar] -> element.filter('span[data-bar]') <- note conversion of _ to - element >> span[foo=x] -> element.filter('span[foo="#{string_escape(x)}"]') Note that this isn't really intended to be a replacement for jQuery's builtin methods; it's just an easy way to do some simple selection. I highly recommend using native jQuery selectors if you need something more powerful. You shouldn't try to get too elaborate with these; I'm not sure how much stuff jQuery's CSS parser can handle. Also note that CSS3's operator precedence differs from Javascript's. In particular, doing things like div > span + div > code is incorrect because it will be parsed as 'div > (span, div) > code' (though it may render properly as a CSS pattern). It's a good idea to parenthesize in this case, just to communicate your intent to whoever's reading your code. Caterwaul removes the parentheses to make it a valid CSS selector. Unlike the montenegro html[] macro, this one doesn't do any autodetection. The good part about this is that you can create custom HTML elements this way. For example: | my_element /jquery -> $('') <- note the conversion of _ to -; this happens in class and attribute names too caterwaul.js_base()(function ($) { $.jquery_macro(language, options) = language.modifier('jquery', this.expand(jquery_expand(match._expression)) -given.match -where [jquery_expand = $.jquery(options)]); $.jquery(options) = $.clone().macros(jquery_macros, string_macros, search_macros) -effect [it.init_function(tree) = this.macroexpand(anon('J[_x]').replace({_x: tree}))] Transforms. There are a lot of stages here, but most of them are fairly trivial. The first, J[], is used to indicate that something needs to be expanded under the jquery grammar. This is responsible for turning elements into jQuery calls, dot operators into classes, etc, and it does most of the heavy lifting. The other large stage is P[], which converts the pattern language into a jQuery CSS selector. The small stages are S[], which just turns something into a string with underscore-to-dash conversion; TS[], which turns something into a tag-string (e.g. TS[foo] = ""); and PS[], which quotes a compiled pattern. -where [jq = options && options.jquery_name || 'jQuery', anon = $.anonymizer('J', 'TS', 'S', 'P', 'PS'), rule(p, e) = $.macro(anon(p), e.constructor === Function ? this.expand(e.call(this, match)) -given.match : anon(e)), hyphenate(s) = s.replace(/_/g, '-'), p = bind [p_pattern = anon('P[_thing]')] in p_pattern.replace({_thing: node}) -given.node, jquery_macros = [rule('J[_element]', given.match [match._element.is_constant() || match._element.length ? wrap_in_jquery(match) : become_dom_node(match)] -where [dom_node_template = anon('#{jq}(TS[_element])'), jquery_template = anon('#{jq}("" + (_element) + "")'), become_dom_node(match) = dom_node_template.replace(match), wrap_in_jquery(match) = jquery_template.replace(match)]), rule('J[_element._class]', 'J[_element].addClass(S[_class])'), rule('J[_element *_attr(_val)]', 'J[_element].attr(S[_attr], _val)'), rule('J[_element *!_name(_val)]', 'J[_element].data(S[_name], _val)'), rule('J[_element /_method(_args)]', 'J[_element]._method(_args)'), rule('J[_element /!_event(_args)]', 'J[_element].bind(S[_event], _args)'), rule('J[_element %_function]', '_function(J[_element])'), rule('J[_element(_children)]', 'J[_element].append(J[_children])'), rule('J[_element[_children]]', 'J[_element].append(_children)'), rule('J[_element > _child]', 'J[_element].append(J[_child])'), rule('J[_element >= _child]', 'J[_element].append(_child)'), rule('J[_element1, _element2]', 'J[_element1].add(J[_element2])'), rule('J[_element1 + _element2]', 'J[_element1].add(J[_element2])'), rule('J[_element >> _pattern]', 'J[_element].filter(PS[_pattern])'), rule('J[_element >>> _pattern]', 'J[_element].find(PS[_pattern])'), rule('J[_element << _pattern]', 'J[_element].parents(PS[_pattern])'), rule('J[(_element)]', '(J[_element])'), rule('J[[_element]]', '[J[_element]]'), rule('J[+_expression]', '_expression')], string_macros = [rule('TS[_identifier]', string('<#{hyphenate(match._identifier.data)}>') -given.match), rule('S[_identifier]', string( hyphenate(match._identifier.data)) -given.match), rule('PS[_identifier]', string(this.expand(p(match._identifier)).data) -given.match)] -where [string(s) = new $.syntax('"' + s.replace(/\\/g, '\\\\').replace(/"/g, '\\"') + '"')], search_macros = [rule('P[_element]', new $.syntax(hyphenate(match._element.data -re [it === '_' ? '*' : it])) -given.match), rule('P[_element._class]', new $.syntax('#{this.expand(p(match._element)).data}.#{hyphenate(match._class.data)}') -given.match), rule('P[_element[_attributes]]', new $.syntax('#{this.expand(p(match._element)).data}[#{this.expand(p(match._attributes))}]') -given.match), rule('P[_attribute = _value]', new $.syntax('#{this.expand(p(match._attribute)).data}="#' + '{#{interpolated(match._value)}}"') -given.match), rule('P[(_element)]', 'P[_element]'), // No paren support rule('P[_element1 + _element2]', binary(', ')), rule('P[_element1, _element2]', binary(', ')), rule('P[_element1 >> _element2]', binary(' ')), rule('P[_element1 >>> _element2]', binary(' ')), rule('P[_element1 > _element2]', binary(' > ')), rule('P[_element1(_element2)]', binary(' > ')), rule('P[_element /_selector]', new $.syntax('#{this.expand(p(match._element)).data}:#{hyphenate(match._selector.data)}') -given.match), rule('P[_element /_selector(_value)]', new $.syntax('#{this.expand(p(match._element)).data}:#{hyphenate(match._selector.data)}("#' + '{#{interpolated(match._value)}")') -given.match)] -where [interpolated(node) = '(#{node.toString()}).replace(/(\\)/g, "$1$1").replace(/(")/g, "\\$1")', binary(op)(match) = new $.syntax('#{this.expand(p(match._element1)).data}#{op}#{this.expand(p(match._element2)).data}')]]})(caterwaul); __ meta::sdoc('js::precompile', <<'__'); Offline precompilation. Uses caterwaul's precompile() method. var code = require('fs').readFileSync(process.argv[2], 'utf8'); require('fs').writeFileSync(process.argv[2].replace(/\.js$/, '.pre.js'), caterwaul.precompile('function () {' + code + '}').toString().replace(/^[\s\n]*function\s*$[^)]*$[\s\n]*\{((?:.|\n)*)\}[\s\n]*$/, '$1'), 'utf8'); __ meta::sdoc('js::test/sanity-check', <<'__'); caterwaul.test(function () { var threw = false; 3 -should_be- 3; try {3 -should_be- 4} catch (e) {threw = true} threw -should_be- true; }); __ meta::sdoc('js::test/trace', <<'__'); Trace tests. Because tracing is done at runtime we can easily write a test for it. These are some fairly trivial tests to make sure nothing major is broken. caterwaul.test(function () { var observed_trees = []; var observed_values = []; var tree_stack = []; var last = function (xs) {return xs[xs.length - 1]}; var t = caterwaul.tracer(function (t) {tree_stack.push(t); observed_trees.push(t)}, function (t, v) {tree_stack.pop() -should_be- t; observed_values.push(v)}); tree_stack.length -should_be- 0; var f = function (x) {return x}; var traced_f = t(f); traced_f(5); tree_stack.length -should_be- 0; observed_values[0] -should_be- traced_f; observed_values[1] -should_be- 5; observed_trees[0].toString() -should_be- caterwaul.parse(f).toString(); observed_trees[1].toString() -should_be- caterwaul.parse('x').toString(); observed_values.length -should_be- 2; observed_trees.length -should_be- 2; // Next step. Make sure Caterwaul can compile its own very gnarly parse() function. // This function won't work, but it should compile at least. t(caterwaul.parse); }); __ meta::sdoc('js::tools/precompile', <<'__'); Caterwaul precompiler | Spencer Tipping Licensed under the terms of the MIT source code license Usage: node precompile.js file.js This will produce a precompiled file called 'file.pre.js'. - include pp::js::caterwaul - include pp::js::extensions/std - include pp::js::extensions/dev - include pp::js::precompile __ meta::sdoc('js::web/code-snippets', <<'__'); Code snippet initialization. This runs after the page is fully loaded. The idea is to setup clickability for each code snippet. setTimeout(linkify_code_snippets, 0), where [linkify_snippet(s) = s.click(send_code_to_prompt), send_code_to_prompt() = $('.shell .prompt .input').val($(this).text()) -effect- $('.shell').click(), linkify_code_snippets() = $('pre.code') *[linkify_snippet($(x))] /seq]; __ meta::sdoc('js::web/header', <<'__'); Page header. This is basically just a navigation container. var page_header = div.header(div.title(span.caterwaul('caterwaul'), span.js('the ', span.accent('edge'), ' of javascript'))) -jquery; __ meta::sdoc('js::web/main', <<'__'); Caterwaul JS web interface | Spencer Tipping Licensed under the terms of the MIT source code license $(caterwaul.js_ui(caterwaul.js_all())(function () { var original_html = $('body').html(), original_tutorial = $('#tutorial'), original_styles = $('style, link[rel="stylesheet"]'); - pinclude pp::js::web/header - pinclude pp::js::web/shell - pinclude pp::js::web/code-snippets $('head').append(title('caterwaul js') -jquery); $('body').empty().append(div.page[page_header, original_tutorial, shell] /jquery); original_styles.appendTo('head')})); __ meta::sdoc('js::web/shell', <<'__'); var shell = shell.append(history_container(), shell_prompt()). click(shell.find('.prompt .input').focus() -given.e) -where [shell = div.shell -jquery, history_container() = div.history -jquery, history_entry_for(s) = pre.entry(span.accent('>'), span.command /text(s)) -jquery, history_result_for(o) = pre.result /text('' + o) -jquery, history_log_for(o) = pre.log /text('' + o) -jquery, history_error_for(e) = pre.error /text('' + e) -jquery, realign() = setTimeout(input.css({width: input.parent().width() - (input.prev().width() + 10)}) -where [input = shell.find('.prompt .input')] -given.nothing, 10), log(x) = shell.children('.history').append(x) -then- realign() -returning- x, context = {expand: given.nothing in shell.animate({left: 0, right: 0}, realign), collapse: given.nothing in shell.animate({left: 600, right: 50}, realign), clear: given.nothing in shell.children('.history').empty() -then- realign() -returning- '', caterwaul: caterwaul.js_ui(caterwaul.js_all()), show: given.tree in context.caterwaul.macroexpand(tree), history: [], history_n: 0, help: given.nothing in 'available variables:\n#{(context /keys /seq).join("\n")}', log: given.thing in log(history_log_for(thing)), trace: given.f in caterwaul.tracer(null, given[t, v] in context.log('#{t} = #{v}'))(f), it: null} -effect [it.context = it], run_command(c) = log(history_entry_for(c)) -then- log(history_result_for(context.it = context.caterwaul(c, context))) /failover [log(history_error_for(context.it = e))], shell_prompt() = div.prompt[prompt, input] -jquery -effect- setTimeout(realign, 10) -effect- it.find('span.prompt').click($(this).siblings('.input').focus() -given.e) -effect- it.find('.input').keydown(realign() -then- history_prev() /effect [e.preventDefault()] /when [e.which === 38] -otherwise- history_next() /effect [e.preventDefault()] /when [e.which === 40] -otherwise- run_it() /effect [e.preventDefault()] /when [e.which === 13] -otherwise- true -given.e) -where [input = input.input /val('help()') -jquery, prompt = span.accent('>') -jquery, h_index = 0, history_prev() = (h[h_index] = input.val()) -when [h_index < history_n] -then- input.val(h[--h_index]) -when [h_index > 0] -where [h = context.history], history_next() = (h[h_index] = input.val()) -when [h_index < history_n] -then- input.val(h[++h_index]) -when [h_index < history_n] -where [h = context.history], history_add(s) = history_n = h_index = context.history.push(s), scroll_to_end() = setTimeout(shell.scrollTop(shell.children(':last') -re [shell.scrollTop() + it.position().top + it.height()]) -given.nothing, 0), run_it() = history_add(t) -then- run_command(t) -then- input.val('') -then- scroll_to_end() -when.t -where [t = input.val()]]]; __ meta::sdoc('web/tutorial', <<'__'); Introduction. Caterwaul is a pure Javascript compiler that lets you change the semantics of functions. To do this it implements a modular decompiler, macroexpander, and compiler that allow you to manipulate code in a first-class way. It also comes with several macro (Lisp-style, not C-style) libraries to make Javascript more fun (though you can easily disable them and/or write your own). A shell is available to interactively use Caterwaul while reading the tutorial below. Javascript extensions. Caterwaul's core macro set starts by extending Javascript syntax in some helpful ways. In particular, it enables quick function assignment and Ruby-style string interpolation: f(x) = x + 1 c String.prototype.say_hi() = 'hi from #{this}!' Caterwaul translates these expressions into this: | f = function (x) { return x + 1; }; String.prototype.say_hi = function (name) { return 'hi ' + (name) + '!'; }; String interpolation and function assignment are the only irregular syntactic forms provided by Caterwaul. Everything else is implemented as a regular form called a modifier. Modifiers. A modifier is a word that is used with an operator to modify a bit of syntax. For example, Caterwaul provides a modifier called when to execute things conditionally: log('hi') -when['foo'.length === 3] There are two parts to a modifier. The first is the operator you use with it (in this case minus), and the second is the modifier and any arguments it takes. The operator is very important; it determines how much stuff you're modifying. For example: log('hi'), log('again') -when[1 === 2] Here the when[1 === 2] only modifies log('again') because minus has much higher precedence than the comma operator. However, Caterwaul lets you use several other operators to change this: log('hi'), log('again'), when[1 === 2] In this case the when[1 === 2] modifies both log statements. The reason for this is kind of subtle: comma left-associates, so the first comma was collapsed into a single syntax node that then became the left-hand side of the second comma. Because Caterwaul operates on the structure of your code, it groups both log statements into the conditional. Modifier operators. There are about six different operators you can use in conjunction with a modifier. From highest to lowest precedence they are:

The slash. For example, log('hi') /when [true]. I use this when I need something tighter than a minus.
The minus. For example, log('hi') -when [true]. It also comes in another form: log('hi') -when- true. I use this most of the time because it seems easier to read.
The in operator. For example, bind [x = 10] in x + 1. in has the same precedence as < and >, which is lower than the arithmetic operators. As a result, it's useful when you're binding variables or creating functions around simple expressions.
The <> operators. These are used around a modifier: log('hi') no_logging. This has the same precedence as in and other relational operators.
The | operator. This is the lowest-precedence regular operator; the only things lower are &&, ||, ?:, assignment, and the comma.
The , operator. This is the lowest-precedence operator in Javascript. It can be dangerous to use because it left-associates; for example, f(x, y, z, where [z = 10]) will invoke f on just one parameter, since the where gobbles everything to its left. (Using a | here would fix the problem.)
The [] operator. This starts the precedence hierarchy over by using explicit grouping. For example, bind[x = 10][log(x)].

Conditional modifiers. when is one of the five conditional modifiers you can use. The others are unless, otherwise, when_defined, and unless_defined. The semantics and return values are: | x -when- y -> y && x x -unless- y -> !y && x x -otherwise- y -> x || y x -when_defined- y -> y != null && x x -unless_defined- y -> y == null && x Binding modifiers. These let you define locally-scoped variables. There is in fact only one such modifier, where, but it can also be called bind to read more naturally: x -where [x = 10] bind [x = 10] in x bind [f(x) = x + 1] in f(7) Despite the name, bind has nothing to do with this binding inside functions. (Though Caterwaul does provide some modifiers to handle that.) Previous versions of Caterwaul called this macro let, but it's a reserved word in recent versions of Javascript. Function modifiers. There are two words that create functions. One is given, which creates a regular function. The other is bgiven, which binds the function to the this where it was defined. For example: given[x] in x + 1 x + 1 -given[x] f.call(10) -where [f = this -given- x] f.call(10) -where [f = this -bgiven- x] There's a shorthand you can use if you just have a single operand for a modifier: x + 1 -given.x given.x in x + 1 given.x [x + 1] Side-effecting modifiers. These make it easy to manipulate values and return things without using an explicit variable. We do this in English using pronouns, and Caterwaul binds the variable it to refer to "the thing that we're working with." There are two ways to create a side-effect. One is to return the side-effecting expression and the other is to return the original value. For example, suppose you want to write a function hash(k, v) that returns a hash h such that h[k] === v. In plain Javascript you'd write this: | var hash = function (k, v) { var result = {}; result[k] = v; return result; }; However, the amount of typing required is much larger than the complexity of the problem. We want to return an object after applying a side-effect to it; to do this with Caterwaul we would use the effect modifier (also called se, which stands for "side-effect"): hash(k, v) = {} -effect [it[k] = v] This style of side-effects returns the original expression. Sometimes, though, you want to return the result of the side-effect rather than the original. For example, here's a zero-division check in plain Javascript: | var x_over_yp1 = function (x, y) { var y_plus_1 = y + 1; return y_plus_1 === 0 ? 0 : x / y_plus_1; }; Here's the same function using a returning side-effect: x_over_yp1(x, y) = y + 1 -returning [it === 0 ? 0 : x / it] The returning modifier is also called then and re. Note that it doesn't actually use return (i.e. it won't jump out of stuff to return from a function). It just returns a value locally as any other expression would. This means you can chain them along: log('hi') -then- log('again') -then- log('!') Side-effecting won't impact the evaluation order of your code. That is, x -effect- y and x -returning- y will always evaluate x before y. Looping modifiers. These repeatedly execute an expression. There are four looping modifiers, each of which has only one name. over is used with arrays; it forms a map. For example: log(it) -over- [1, 2, 3] This not only invokes log on each element, but returns an array of the results returned by each invocation of log(it). Similar to over are over_keys and over_values, each of which operates on an object. Like over, each one returns an array of results (though unlike over the array will not have any particular order). For example: log(it) -over_keys- {foo: 'bar', bif: 'baz'} log(it) -over_values- {foo: 1, bar: 2} The final modifier is until, which does exactly what it sounds like: x = 0, log(x) -until [++x >= 10] You can use these basic modifiers, but if you plan on doing any heavy lifting you should check out the sequence library below. (I rarely use anything else for iterative functions.) Quotation. Most people won't use this, but it's handy if you're doing heavy-duty syntax analysis or writing complex macros. The standard library includes an obscure modifier called qs that you can use to quote a piece of code. Quotation is basically grabbing the literal syntax rather than evaluating it normally. For example: qs[foo + bar] qs[foo + bar].data qs[foo + bar].length qs[foo + bar][0] Quotation is an idea that comes from Lisp and is handled similarly by Caterwaul. (The only difference is that Caterwaul returns its own n-ary syntax tree format instead of cons trees.) A variant, qse, macroexpands the quoted code before returning it as a syntax tree. For example: qse[log(foo) -unless[true]] log(foo) -unless[true], qse Other modifiers. There are a few more modifiers that I threw in to the standard library to make some edge cases easier. 'oh no!' -wobbly 'another error!' -chuck null.foo -failover- log('got #{e}') safely [alert(e)] in undefined.bar Sequence library. This is probably the gnarliest part of Caterwaul, but in my opinion it's also the most useful. The sequence library provides a modifier called seq that reinterprets some syntax within an APL-like domain-specific language. It generates very efficient code and lets you express maps, folds, cartesian products, zips, etc, with very little effort. For instance, suppose we want an array of the first 10 squares. Using until, the algorithm looks like this: bind [i = 0] in i*i -until [++i > 10] Using the sequence library looks like this: n[1, 11] *[x * x] /seq Mapping and iterating. The * operator is responsible for mapping, iterating, and flat-mapping. It's fairly easy to use; you just "multiply" a sequence by a bracketed expression. * will create a variable called x and evaluate your expression for each element in the sequence. It then collects these results and returns a new array. For example: seq in [1, 2, 3] *['x = #{x}'] You don't have to use just arrays. You can use anything with a .length and [0] ... [n - 1] attributes. One of the most common non-array collections I use is a jQuery selector (just be sure to wrap x again so that you're not dealing with a plain DOM node): seq in $('div') *[$(x).attr('class')] Alternative forms. Most operators have an alternative form that does something similar to the original. You specify this form by using a ! after the operator. The alternative form of * is used to iterate without collecting the results; doing this returns the original array. For example: seq in [1, 2, 3] *![log(x)] The third use of * is flat-mapping, which is denoted by writing *~!. For example: seq in [1, 2, 3] *~![[x, x + 1]] Like the original form, these alternative forms can be combined with any of the operator features below. Operator features. The sequence library uses operators to describe operations on arrays. Most of them are regular binary infix operators like + and *, though a few of them have names (such as n[] above). Despite the wide array of operators supported, there is a high degree of regularity among them. Each operator that takes a block (like * does) has several options that can be set to change the way it interprets the block. Sequence interpretation. Normally the expression inside [] is interpreted as a regular Javascript expression. But sometimes you want to remain in sequence context so that you don't have to explicitly modify the expression. To do that, you prefix the [] with a ~: seq in [[1], [2], [3]] *~[x *[x + 1]] Variable renaming. In the example above we lost access to the outer x due to shadowing. To avoid this problem, the sequence language lets you rename any variable by prefixing the [] with a new variable name: seq in [1, 2, 3] *y[y + 1] You can use both of these options at the same time, yielding this: seq in [[1], [2], [3]] *~y[y *[x + 1]] Note that you can't say *y~[...], as this is invalid Javascript syntax (~ is always a unary operator). Filtering. The filtering family of operators is denoted by %. For instance, here's a way to get multiples of three: seq in [1, 2, 3] %[x % 3 === 0] Alternative forms. Negation is so high precedence that it's often difficult to work it into a form without adding parentheses. The alternative form of % negates the predicate: seq in [1, 2, 3] %![x % 3] The other alternative form of % is a simultaneous map/filter. The idea is to return the expression value when it's truthy and drop the element otherwise. For example, we can get the squares of all negative elements this way: seq in [1, -2, -3, 4] %~![x < 0 && x * x] Folding. You can fold stuff through a binary expression by using the / family of operators. / has two forms: left fold (the default), and right fold (written as /!). For example, here is how you might sum a bunch of numbers: seq in [1, 2, 3] /[x + x0] Since + is associative it doesn't matter which direction the fold goes. It becomes obvious, however, if we interpolate the values into a string: seq in [1, 2, 3] /['[#{x}, #{x0}]'] seq in [1, 2, 3] /!['[#{x}, #{x0}]'] Notice that for folding we have a new variable x0. There are actually a few variables you have access to depending on what you're doing. Inside any block you'll have x, xi (the current index), and xl (the length of the original sequence). x0 is available only when folding. Each of these changes uniformly if you rename the variable; so for instance: seq in [1, 2, 3] /bar[bar + bar0 + bari + barl] Quantification. The sequence library provides existential quantification on arrays. Each of these uses a block that acts as a predicate. So, for instance, to determine whether any element in an array is positive: [-4, -5, 10, 2] |[x > 0] |seq The | operator returns the first truthy value generated by the expression (not just true or false), so you can use it to detect things too. This block causes the sequence comprehension to return not only whether an element is positive, but if so the first such element will be returned: [-4, -5, 10, 2] |[x > 0 && x] |seq [-4, -5, 10, 2] |[x -when[x > 0]] |seq We can also use this construct to return the index of the first matching element. Because an index of 0 is falsy, we'll have to add one (so 0 is the not-found value rather than -1): [-4, -5, 10, 2] |[xi + 1 -when[x > 0]] |seq Combination. There are three ways you can combine things. The most obvious is concatenation, written +: seq in [1, 2, 3] + [4, 5, 6] Less obvious are zipping, written ^, and the inner product, written -. Because ^ has lower precedence than in, we have to switch to a lower-precedence modifier form for seq. For example: [1, 2, 3] ^ [4, 5, 6] |seq The inner product takes every possible pairing of elements from the two sequences: seq in [1, 2, 3] - [4, 5, 6] Each of these operators has lower precedence than *, /, and % (all of which have equal precedence), so they can be used without parentheses. Objects. A really useful and important feature of the sequence library is that it works with objects very easily. It has four operators, /keys, /values, /pairs, and |object, that can convert between objects and arrays. You can pull an array of the keys or values of an object (not in any particular order of course) by using /keys and /values. For example: window /keys -seq jQuery /values -seq More interesting is the /pairs operator. This pulls out key-value pairs as two-element arrays: {foo: 'bar', bif: 'baz'} /pairs -seq Its inverse is the |object operator, which turns an array of those pairs back into an object: [['foo', 'bar'], ['bif', 'baz']] |object |seq Note the differing precedences of /keys etc. and |object. This is intentional. The rationale is that you rarely manipulate objects as objects in sequence comprehensions, since the sequence library has no useful operators for objects other than unpacking. Therefore, objects come from various other values and enter a sequence comprehension, which may at the very end zip an intermediate result into a final object return value. I may change this in the future as I use it more, but any changes will be backwards-compatible. Numerical iteration. Within a sequence comprehension you have access to the n[] operator, which generates arrays of evenly-spaced numbers. It has three uses. When invoked on one argument it returns integers between 0, inclusive, and the number, exclusive. When invoked with two arguments the first becomes the inclusive lower bound and the second is the exclusive upper bound. Adding a third argument changes the increment from its default value of 1. For example: n[10] -seq n[5, 8] -seq n[0, 1, 0.25] -seq Development tools. Most compilers operate offline; that is, they generate standalone code with no references back to the compiler. However, there are some cases where you want to interact with code as it's running. Caterwaul's tracing extension is one way to do this. The idea behind a trace is that you can observe when (1) an expression is about to be evaluated, and (2) the value it produced after evaluation. Caterwaul does this by inserting hook functions into your source; these functions ideally don't change any behavior (other than making your code a bit slower) and allow you to see what's happening. You get to determine what to do with the observed expressions and values. Here's an example of defining a function and then tracing it (note that Caterwaul doesn't provide the trace() function used here): f(n) = n ? n * f(n - 1) : 1 f = trace(f) f(5) If you run these statements and scroll back a bit you'll see these huge gnarly expressions with variables like gensym_1_gnhnr4un_bwv17j. This is called a gensym (the term comes from Lisp parlance), and Caterwaul uses variables like this when it needs a unique name. In this case we're seeing gensyms because this is how Caterwaul names its trace functions. Building your own tracer. Sometimes you want to do something besides listing the expression values. Maybe you want to profile stuff, for example. To do this, you need to construct your own tracer. You do this by calling caterwaul.tracer(), which takes two optional callbacks and returns a trace function. (The trace() function used above is the result of caterwaul.tracer().) The first callback, if it is defined, will be invoked on each syntax tree before that tree is evaluated. The second callback will be invoked on the syntax tree and the value that it produced. Based on this information we can now construct a very simple profiler that counts the number of evaluations of each expression: counts = {}, trees = {} count(tree) = trees[tree.id()] = tree -effect [counts[tree.id()] = (counts[tree.id()] || 0) + 1] profile = caterwaul.tracer(null, count) Now let's profile something: is_prime(n) = !(n[2, Math.sqrt(n) + 1] |[n % x === 0] |seq) takes_a_bit() = n[10000] %[is_prime(x)] /seq profile(takes_a_bit)() At this point the profiling data is in counts and trees. counts maps tree IDs to the number of times that tree was evaluated, and trees maps tree IDs to the trees they represent. Let's stash the tree-count pair list into its own variable: pairs = counts /pairs *[[trees[x[0]], x[1]]] /seq This is a complete profile, but maybe we don't want that much information at once. Let's just look for trees that represent push() invocations: pairs %[qs[_x.push(_y)].match(x[0])] /seq Here we're using some methods provided by syntax trees. We first quote a pattern (which is an instance of Caterwaul's syntax tree class), and then we call its match() method on another tree. match() returns an object if x[0] matches qs[_x.push(_y)] and false otherwise. For the purposes of matching, identifiers that start with underscores can match against any expression. The object that match() returns maps the names of these wildcards to the trees they matched against. Here are some other queries you could perform: pairs %[qs[is_prime(_x)].match(x[0])] /seq pairs %[x[1] > 5000] /seq Using Caterwaul. Because Caterwaul is written in Javascript, it's very easy to add to your application: | The application code can then refer to caterwaul to access the compiler. __ meta::template('comment', '\'\'; # A mechanism for line or block comments.'); meta::template('eval', <<'__'); my $result = eval $_[0]; terminal::warning("Error during template evaluation: $@") if $@; $result; __ meta::template('failing_conditional', <<'__'); my ($commands) = @_; my $should_return = $commands =~ / if (.*)$/ && ! eval $1; terminal::warning("eval of template condition failed: $@") if $@; $should_return; __ meta::template('include', <<'__'); my ($commands) = @_; return '' if template::failing_conditional($commands); join "\n", map retrieve($_), split /\s+/, $commands; __ meta::template('pinclude', <<'__'); # Just like the regular include, but makes sure to insert paragraph boundaries # (this is required for SDoc to function properly). my ($commands) = @_; return '' if template::failing_conditional($commands); my $text = join "\n\n", map retrieve($_), split /\s+/, $commands; "\n\n$text\n\n"; __ meta::template('script-include', <<'__'); my ($name) = @_; my $s = 'script'; my $script = retrieve($name); "<$s>\n$script\n"; __ meta::template('style-include', <<'__'); my ($name) = @_; my $s = 'style'; my $style = retrieve($name); "<$s>\n$style\n"; __ internal::main(); __END__

').append('hi')).add($('

Introduction

Javascript extensions

Modifiers

Modifier operators

Conditional modifiers

Binding modifiers

Function modifiers

Side-effecting modifiers

Looping modifiers

Quotation

Other modifiers

Sequence library

Mapping and iterating

Alternative forms

Operator features

Sequence interpretation

Variable renaming

Filtering

Alternative forms

Folding

Quantification

Combination

Objects

Numerical iteration

Development tools

Building your own tracer

Using Caterwaul

Foo

Bar