# Compdata Trees and Catamorphisms

Data types a la carte and in particular P. Bahr and T. Hvitved’s implementation compdata provide a way to create extensible data types and a library of recursion schemes ready to use on them. Compositional data types and recursion schemes are useful tools for manipulating recursive structures that under go structural changes while retaining some shape/data through different passes such compiler passes on tree types.

I’m going to go through a example of using compdata to transform a simple expression tree by adding and removing timing annotation and delay nodes through recursion schemes. The timing and delays correspond to buffering in synchronous circuits but that won’t be the focus here.

## Building our expression trees

Lets start with a simple untyped expression data type with some placeholder unary and binary operations:

```
data Expr r
= ExprUnaryOp UnaryOp r
| ExprBinaryOp BinaryOp r r
| ExprLiteral Int
| ExprVariable Name
deriving (Show, Eq, Functor)
newtype Name = Name Int deriving (Show, Eq)
data BinaryOp = OpAdd deriving (Show, Eq, Enum, Bounded)
data UnaryOp = OpNegate deriving (Show, Eq, Enum, Bounded)
```

`Expr`

has been defined using a functor representation- elements of type `r`

will be children nodes of the expression. This style of recursive data type
representation is common to recursion scheme libraries as it allows recursion to
be expressed through operations on functors. If you are unfamiliar with this
representation a good intro is
https://blog.sumtypeofway.com/an-introduction-to-recursion-schemes/. There are
a number of other intros to recursion schemes although many assume knowledge of
category theory terminology.

As mentioned above, my particular use is going to be extending the simple expression
type above to include delay nodes that represent the placement of timing
delays for synchronous circuits. We can start by adding an additional data type
representing delay nodes and then using the type operator `:+:`

exposed by
`compdata`

to combine it with our original `Expr`

, giving us a compositional
type where any node can be of type `Expr`

or `ExprDelay`

:

```
data ExprDelay r = ExprDelay r
deriving (Show, Eq, Functor)
type ExprWithDelays = ExprDelay :+: Expr
```

## Necessary boilerplate

There is a lot of boilerplate needed to make these types usable, but thankfully compdata provides a template Haskell solution to generate the common cases. As the names below suggest this includes foldable and traversable instances which will allow us to treat our child nodes as a generic collection, along with smart constructors which give us a straight-forward way to define terms.

```
$(derive [makeTraversable, makeFoldable, makeShowConstr,
makeEqF, makeShowF, smartConstructors, smartAConstructors]
[''Expr, ''ExprDelay])
```

`Arbitrary`

instances are always useful so lets add them, and while we are at it we can add pretty printing `Render`

standalone deriving declarations that will print constructor names:

```
instance Arbitrary Name where
arbitrary = Name <$> abs `fmap` (arbitrary :: Gen Int)
instance Arbitrary UnaryOp where
arbitrary = elements [minBound..maxBound]
instance Arbitrary BinaryOp where
arbitrary = elements [minBound..maxBound]
$(derive [makeArbitraryF] [''Expr, ''ExprDelay])
deriving instance Render Expr
deriving instance Render ExprDelay
```

Finally, it’s always useful to have at least one example on hand so lets add an example
expression. Since we already have `Arbitrary`

instances we can generate
examples directly in `GHCi`

with `generate arbitrary :: IO (Term Expr)`

. `Term`

is the fixed-point data type provided by `compdata`

, and any
examples copied from `GHCi`

will need to be converted to smart constructor form by
prefixing the constructors with `i`

. These are functions generated by the
template Haskell above that injects our types into the fixed-point `Term`

representation, and requires an explicit type annotation which can be anything that includes our data type as a subtype. That means we can use the same constructors to create many different structures- the `Term Expr`

below could be replaced with anything that includes `Expr`

as a subtype e.g. `Term (Expr :+: ExprDelay)`

.

```
example :: Term Expr
example = iExprBinaryOp OpAdd (iExprUnaryOp OpNegate (iExprBinaryOp OpAdd (iExprLiteral 0) (iExprVariable $ Name 1))) (iExprVariable $ Name 0)
```

There is nothing special about the above instance but as it uses a mix of nodes we can use it as a litmus test during development.

## Annotating trees with timing

Now to the meat of the problem. I’d like to “retime” the tree by adding delay nodes after operations. In the future I’d like the decision of where to add delays to be more intelligent but for now I am going to follow these simple rules:

- Literals and combinations of literals and operations should not have any timing associated with them.
- Any variables instantiations should be sampled at time 0.
- Any operation that isn’t covered by (1) should occur at the max timing of its children, and should be followed by a delay node.
- Any child node to an operation that isn’t the max timing of the children should be delayed with delay nodes until it is equal to the max timing of the children.

I’m going to use `Maybe Int`

to represent the timing of a node, and use the `compdata`

annotation operator `:&:`

to attach some non-functor value to each node of our recursive structure. We don’t need an explicit data type for the annotated tree, but I will derive a `Render`

instance for it.

```
type Timing = Maybe Int
deriving instance Render (ExprWithDelays :&: Timing)
```

Since the timing depends solely on the direct children of a node, we can perform all the steps above with a single pass through our structure from leaves to root. Recursion schemes give us a principled way to do this called a catamorphism, which amounts to a fold through our data structure. If I decide later to add different timing strategies it might make sense to reach for something more exotic such as a zygomorphism, but we won’t bother with that for now. If you’re interested in learning more about the different types of common morphisms I recommend reading through a cheat-sheet style reference such as this one.

Catamorphisms work by calling some algebra function on each level of our data
structure. The algebra function is of type `type Alg f a = f a -> a`

. The thing
to realize here is that `f`

is the current unwrapped node we are working with,
and its children are of type `a`

corresponding to results of the algebra on
its children. If you haven’t seen this before it might take a moment to click,
and I suggest playing around with the `compdata`

examples.

Since I know I want to keep the tree structure but add timing information, the
algebra will build the tree back up with a type like `Term (ExprWithDelays :&: Timing)`

. We can
actually be a bit more general and use typeclass and that applies our algebra to any
structure, and then only add instances for relevant types. In the snippet below `liftSum`

instances provide default instances to the constructors of `:+:`

which may exist in our `f`

type.

```
class Retime f v where
retime :: Alg f (Term (v :&: Timing))
$(derive [liftSum] [''Retime])
```

Lastly the instances themselves. From our rules above the cases for
`ExprLiteral`

and `ExprVariable`

are straight forward, we can just use the smart
constructors that take an additional annotation parameter to attach our timing:

```
instance (Functor v, Expr :<: v, ExprDelay :<: v) => Retime Expr v where
retime (ExprLiteral x) = iAExprLiteral Nothing x
retime (ExprVariable n) = iAExprVariable (Just 0) n
```

Things get a little more hairy for the other instances but nothing too bad. Accessing the timing annotations in previous results requires a little
bit of unwrapping that might be better refactored into standalone functions. A helper
function “`go`

” makes adds variable length delay limbs, and most of
the rest of the code is dedicated to different `Maybe`

timing conditions.

```
retime (ExprUnaryOp op v@(Term (_ :&: Nothing))) =
iAExprUnaryOp Nothing op v
retime (ExprUnaryOp op v@(Term (_ :&: t))) =
iAExprDelay (fmap (+1) t) (iAExprUnaryOp t op v)
retime (ExprBinaryOp op
l@(Term (_ :&: lm))
r@(Term (_ :&: rm))
) = case (lm, rm) of
(Nothing, Nothing) -> iAExprBinaryOp Nothing op l r
(t, Nothing) -> iAExprDelay (fmap (+1) t) (iAExprBinaryOp t op l r)
(Nothing, t) -> iAExprDelay (fmap (+1) t) (iAExprBinaryOp t op l r)
(Just lt, Just rt) ->
iAExprDelay (Just $ max (lt+1) (rt+1))
(iAExprBinaryOp (Just $ max lt rt) op
(go lt (rt - lt) l)
(go rt (lt - rt) r)
)
where
go t i | i > 0 = go (t+1) (i-1) . iAExprDelay (Just (t+1))
| otherwise = id
```

We could also write this a different way. Since our nodes have instances of foldable, we could take the generic approach and fold/map over children to find and resolve timing. Unfortunately building the structure back up requires injections back into the `compdata`

structures that muddies the code:

```
-- This replaces the 'retime (ExprUnaryOp...' and 'retime (ExprBinaryOp...' above.
retime e = case longestDelay of
Nothing -> Term $ injectA Nothing (inj e)
-- More generic, but unfortunately less readable:
Just t -> iAExprDelay (Just $ 1 + getMax t) -- 5. Add an additional delay node
$ Term -- 4. Wrap into the Term datatype
$ injectA (Just $ getMax t) -- 3. Add our timing annotation for e
$ inj -- 2. Inject e into the v datatype
$ fmap (resolveTiming (getMax t)) -- 1. Resolve timing for children
$ e
where
timing (Term(_:&:t)) = t
longestDelay = foldMap (fmap Max . timing) e
resolveTiming tmax v@(Term (_ :&: Nothing)) = v
resolveTiming tmax v@(Term (_ :&: (Just t))) = go t (tmax-t) v
go t i | i > 0 = go (t+1) (i-1) . iAExprDelay (Just (t+1))
| otherwise = id
```

The generic version has the benefit of not requiring updating when modifying `Expr`

, but is a poor option if you are working with people who might not be comfortable with `compdata`

structures.

## Property checking

There is a direct opposite to adding delays and timing annotations- removing them.
`compdata`

already provides a method to remove annotations via `stripA`

. To
remove delay expression nodes we can write another much simpler algebra that
removes any `ExprDelay`

nodes and leaves any others untouched:

```
class RemoveDelays f v where
removeDelays :: Alg f (Term v)
$(derive [liftSum] [''RemoveDelays])
instance {-# OVERLAPPABLE #-} (f :<: v) => RemoveDelays f v where
removeDelays = inject
instance RemoveDelays ExprDelay v where
removeDelays (ExprDelay r) = r
```

This leads us to a QuickCheck-able property that `Expr`

s are invariant under
adding removing timing:

```
prop_addRemoveTiming :: Term Expr -> Bool
prop_addRemoveTiming x = x == (removeTiming . addTiming $ x)
where
addTiming x = cata retime x :: Term (ExprWithDelays :&: Timing)
removeTiming x = cata removeDelays (stripA x) :: Term Expr
```

And run it in GHCi:

```
ghci> quickCheck prop_addRemov
+++ OK, passed 100 tests.
```

Success! We can continue to add property checks to test the rules above and potentially structure them as catamorphisms, but I’ll stop here for now. Compilable code can be found at https://github.com/blaxill/blog-compdata-trees-and-catamorphisms.