Seminar

Why something like omath anyway? -- it's something like how mathematicians do calculations... -- integrals,

-- not like procedural progamming langauges; you can always write foo[bar]; it only evaluates if there's a pattern. -- similarly for typing

Abstract structure of omath
In its most general form, the 'omath' system is simply a framework for three components:


 * 1) a pattern matching algorithm
 * 2) a mechanism for storing replacement rules
 * 3) an evaluation strategy

I'll try to describe what each of these components 'should' look like, and then how together they are everything we need for a simple computer algebra system. This framework is flexible enough to accomodate some very different looking applications. Two extremes are a system emulating Mathematica, and a regular expression engine.

To begin, I'll describe the three basic notions we'll need.


 * 1) expressions
 * 2) patterns
 * 3) replacement rules

(In some sense, patterns are actually redundant if we allow a more general notion of 'replacement rule', but for most purposes a replacement rule is simply a pair; a pattern along with an expression)

It doesn't really matter what you think of expressions as; they could really be any class of object at all. (Strings, integers, finite rooted trees of objects from some set, etc.) But in nearly all interesting examples, we'll ask that it's possible to single out certain expressions, and call them 'symbol expressions'. The name tells us what these are for; they're meant to correspond to our (vague) notion of a variable.

In the really interesting example of this framework, namely 'omath' as a computer algebra system, 'symbol expressions' consist of a string; something like "x", "sin" or "JonesPolynomial". There are other types of expressions, representing integers, real numbers, and text, and then 'compound expressions'. Each compound expression has a head (another expression), and leaves (a list of other expressions), and we denote such expressions like so: head[leaf1, leaf2, leaf3]. This isn't particularly important for now, however!

Now, what's a pattern? Abstractly, a pattern is a function from expressions to sequences of 'bindings', where a 'binding' is a collection of pairs of symbol expressions and expressions. (It's not necessarily a function in the mathematical sense, but in the programming sense; there's no need to insist the answer comes out the same every time, and doesn't depend on external, unspecificied conditions.) Essentially, this functions describes all the possible ways a given expression can 'be matched by' or 'match against' the pattern.

Some simple examples should explain everything! For now at least I'll use 'normal mathematical notation', and tell you in a moment when I switch over to mathematica-like syntax.

When we write $$f(x) = x^2$$ (intending a definition), the left hand side is a pattern (it says something of the form 'f applied to anything'). Let's consider how this pattern matches against various expressions -

$$f(x) <- f(2) = \{ x -> 2 \}$$

$$f(x) <- g(7) = \{ \}$$

That's about all there is to say about this pattern! Either it matches exactly one way, producing a binding for x, or it doesn't match at all.

Let's try a more complicated pattern. Let's say we wanted to express that the determinant of a matrix with a repeated row was zero. We might reasonably write $$det(..., x, ..., x, ...) = 0$$. Then

$$det(..., x, ..., x, ...) <- det(u, u, v, w, v) = \{ x -> u, x -> v \}$$

while

$$det(..., x, ..., x, ...) <- det(a, b, c) = \{ \}$$

There are plenty of sensible ways in which a pattern can match in multiple ways. I'm explain an example showing some of Mathematica's pattern matching syntax. Here the 'binding symbols' appear in front of a colon, with some sort of pattern on the right of the colon. The most general pattern is just an underscore: _, which can match against anything. Thus f[_] matches against f[2], but produces an single empty binding, while f[x:_] matches against f[2], producing a single binding sending x to 2. For many simple patterns, you can actually omit the colon -- x_ is just shorthand for x:_. To indicate a repeated pattern, follow the pattern with '..'.

Thus

$$f[(x:\_).., y:(\_..)] <- f[1,1,1,2,2] = \{ (x->1, y->(1,1,2,2)), (x->1, y->(1,2,2)), (x->1, y->(2,2)) \}$$

Now for a replacement rule. The most general form is simply a function taking an expression and producing a sequence of expressions. In practice, however, many of the replacement rules we use will simply be a pair, a pattern and an expression, written as p -> x. How do we see this as a replacement rule? Well, when applied to an expression y, we consider all possible bindings against the pattern p. For each such binding, we replace any bound symbols in x with the correspond expressions.

A first example; consider the replacement rule f(x) -> x^2. How does this act of f(2)? As we've seen, there's exactly one binding for f(x) <- f(2), namely x->2, So we 'make this substitution' in the expression x^2, obtaining 4. Thus

$$(f(x) -> x^2)(f(2)) = \{ 4 \}$$

More general replacement rules are possible; a common form we use in the implementation of omath is a pair consisting of a pattern, and a function turning bindings into expressions (a function implemented in whatever programming language is appropriate in the context). The replacement rule itself then iterates through all ways of matching the pattern, feeding the bindings into the function to produce results. Often we can be sure that a given pattern will always produce bindings for the same collection of symbols (ie, different values of the binding, but always the same set of symbols), the function used in the replacement rule could take named arguments, and have some external framework automatically 'unwrap' the binding.

Finally, the most general form of a replacement rule can completely ignore the usual notion of a pattern, and generate results in any manner whatsoever!

That was the basics; now we return to the three components of omath-

a pattern matcher
Generally, a pattern matcher consists of a set of rules for interpreting an expression as a pattern. For example, the mathematica syntax $$f[(x:\_).., y:(\_..)]$$ is translated by the mathematica parser as the expression f[Repeated[Pattern[x, Blank[]]], Pattern[y, Repeated[Blank[]]]]. The pattern matcher then compiles this into a 'pattern' in the sense above. In general, any expression whatsoever can be compiled to a pattern in this way, but most things will result in a pattern which only matches against the original expression. The pattern matcher can then add special meanings to particular expressions, as Blank, Pattern, and Repeated demonstrate above.

Hopefully I'll have time to show you some more examples of the pattern matcher, and how patterns are actually implemented, because for me it's been the most fun code to write!

storing replacement rules
This isn't so exciting, from a theoretical point of view, although in practice is quite involved. The simplest system would be simply to maintain a singlelist of rules, and to try each in turn on any given expression. In practice, this is extremely slow, as many irrelevant rules have to be tested at every step of the evaluation. Instead, pattern-based rules are partitioned up according to the symbols appearing in the head, or as the heads of the leaves, and only subsets of the full collection of rules are considered at any given time, depending on the symbols appearing in the uppermost layers of the current expression.

Nearly always, storing rules as either an unordered set, or an ordered list, is not so good. If we don't specify an order to the rules, evaluation is nondeterministic, which is often not desirable. On the other hand, if rules will be tried simply in the order they are defined, it's not possible to later define a 'more specific' rule which overrides the behaviour of a 'more general' rule. In particular implementations, we'll need to describe some order, or partial order, on patterns or replacement rules...

There are also interesting practical optimisations which can be made -- in particular it's often possible to assemble patterns with shared subpatterns into trees, so that if two patterns share an initial subpattern, matching against that initial subpattern only occurs once. Nothing like this has been implement so far, and I'm resisting the temptation to do it, because it would be a fun project for someone who wanted to learn about pattern matching! :-)

For future reference, the component of the omath system responsible for storing replacement rules is called the 'kernel state'. (Maybe not quite the right name; some aspects of the state of the system are stored elsewhere?)

an evaluation strategy
There are many different 'evaluation strategies' one might pursue, and many variations on the particular one mathematica itself uses. I'll describe a few, gradually approximating the mathematica one (it's interesting, mostly because it's full of compromises, some of which we might want to resolve differently.

The very simplest might be called 'entire expression, fixed point evaluation'. To 'evaluate' an expression, we simply repeating apply replacement rules to the entire expression, until (hopefully) reaching a fixed point. (Possibly with some mechanism to bail out after trying to reach a fixed point for too long, or for detecting loops.)

However, this isn't so useful. Say we define a rule $$f[x\_Integer]:=x^2$$, and then try to evaluate $$f[f[3]]$$. Using 'entire expression' evaluation, this will not change under evaluation, simply because f[3] does not look like an integer! Much more useful is 'recursive' evaluation. Now, before trying to apply any replacement rules to the entire expression, we recursively evaluate each of the leaves of the expression (actually, we also evaluate the head of the expression as well!) Under recursive evaluation, to evaluate $$f[f[3]]$$ we first evaluate the leaf f[3], to which the rule above applies, returning 9. We now evaluate f[9], obtaining 81, as desired.

In practice, we need to make recursive evaluation considerably more complicated, in order to be useful. In particular, we need to be able to prevent evaluation of leaves when appropriate. For example, let's consider implementing a conditional statement (an if-else statement) in this term rewriting model. A first try might be

If[True, x_, _] := x If[False, _, y_] := y

This works pretty well, on first glance. If we type

If[3^2 == 9, foo, bar]

3^2 evaluates to 9, 9 == 9 evaluates to True, and the entire expression is replaced by foo. Let's consider a more complicated version, where instead of 'just symbols' foo and bar, we have expressions whose evaluations have (cataclysmic!) side effects.

If[TheRussiansAreComing[], LaunchNukes[], DrinkTea[]]

What happens now? We evaluate each of the leaves; TheRussiansAreComing[] presumably evaluates to False, LaunchNukes[] causes us to start a war as a side effect, then returns some form of confirmation, say the expression Done, and DrinkTea[] has less alarming side effects, returning Mmmm Finally, of course, the pattern for If evaluates, now with the leaves If[False, Done, Mmmm], which is replaced by Mmmm. We enjoy our tea, and don't even realise we've ended the world... Obviously not a good idea. Clearly, If should not allow its later arguments to be evaluated, until after it has access the result of evaluating the first argument. In mathematica-language, we'd say If has the attribute HoldRest, meaning all but the first argument are 'held'. Now the above statement will be transformed into the intermediate result If[False, LaunchNukes[], DrinkTea[]], after leaves have been appropriately evaluated, which will be replaced, by the rule above, with DrinkTea[]. The next step of the fixed point expression evaluator will then replace DrinkTea[] with Mmmm, with appropriate side effects.

There are plenty of other subtle complications in making a recursive evaluator useful, which I'll omit today!

There's another evalation strategy worth mentioning, in contrast to what I've been calling 'fixed point'. This is 'infinite evaluation' (not really the right name; this one should just be called fixed point evaluation, and what I described above should be 'naive fixed point'). Again, we repeating apply rules (presumably recursively, with the same complications as above), but take into accound more while looking for a fixed point. Instead of stopping once the expression doesn't change, we also take into account the entire kernel state (ie, the list of rules stored in the kernel), and continue evaluation if anything has changed there.

Here's a good example:

count = 0; f[x_] /; count++ > 100 := 0