Data types a la carte and in particular P. Bahr and T. Hvitved’s implementation compdata provide a way to create extensible data types and a library of recursion schemes ready to use on them. Compositional data types and recursion schemes are useful tools for manipulating recursive structures that under go structural changes while retaining some shape/data through different passes such compiler passes on tree types.

I’m going to go through a example of using compdata to transform a simple expression tree by adding and removing timing annotation and delay nodes through recursion schemes. The timing and delays correspond to buffering in synchronous circuits but that won’t be the focus here.

Building our expression trees

Lets start with a simple untyped expression data type with some placeholder unary and binary operations:

data Expr r
  = ExprUnaryOp UnaryOp r
  | ExprBinaryOp BinaryOp r r
  | ExprLiteral Int
  | ExprVariable Name
  deriving (Show, Eq, Functor)

newtype Name = Name Int deriving (Show, Eq)

data BinaryOp = OpAdd deriving (Show, Eq, Enum, Bounded)
data UnaryOp = OpNegate deriving (Show, Eq, Enum, Bounded)

Expr has been defined using a functor representation- elements of type r will be children nodes of the expression. This style of recursive data type representation is common to recursion scheme libraries as it allows recursion to be expressed through operations on functors. If you are unfamiliar with this representation a good intro is https://blog.sumtypeofway.com/an-introduction-to-recursion-schemes/. There are a number of other intros to recursion schemes although many assume knowledge of category theory terminology.

As mentioned above, my particular use is going to be extending the simple expression type above to include delay nodes that represent the placement of timing delays for synchronous circuits. We can start by adding an additional data type representing delay nodes and then using the type operator :+: exposed by compdata to combine it with our original Expr, giving us a compositional type where any node can be of type Expr or ExprDelay:

data ExprDelay r = ExprDelay r
  deriving (Show, Eq, Functor)

type ExprWithDelays = ExprDelay :+: Expr

Necessary boilerplate

There is a lot of boilerplate needed to make these types usable, but thankfully compdata provides a template Haskell solution to generate the common cases. As the names below suggest this includes foldable and traversable instances which will allow us to treat our child nodes as a generic collection, along with smart constructors which give us a straight-forward way to define terms.

$(derive [makeTraversable, makeFoldable, makeShowConstr, 
          makeEqF, makeShowF, smartConstructors, smartAConstructors]
         [''Expr, ''ExprDelay])

Arbitrary instances are always useful so lets add them, and while we are at it we can add pretty printing Render standalone deriving declarations that will print constructor names:

instance Arbitrary Name where
  arbitrary = Name <$> abs `fmap` (arbitrary :: Gen Int)

instance Arbitrary UnaryOp where
  arbitrary = elements [minBound..maxBound]
instance Arbitrary BinaryOp where
  arbitrary = elements [minBound..maxBound]

$(derive [makeArbitraryF] [''Expr, ''ExprDelay])

deriving instance Render Expr
deriving instance Render ExprDelay

Finally, it’s always useful to have at least one example on hand so lets add an example expression. Since we already have Arbitrary instances we can generate examples directly in GHCi with generate arbitrary :: IO (Term Expr). Term is the fixed-point data type provided by compdata, and any examples copied from GHCi will need to be converted to smart constructor form by prefixing the constructors with i. These are functions generated by the template Haskell above that injects our types into the fixed-point Term representation, and requires an explicit type annotation which can be anything that includes our data type as a subtype. That means we can use the same constructors to create many different structures- the Term Expr below could be replaced with anything that includes Expr as a subtype e.g. Term (Expr :+: ExprDelay).

example :: Term Expr
example = iExprBinaryOp OpAdd (iExprUnaryOp OpNegate (iExprBinaryOp OpAdd (iExprLiteral 0) (iExprVariable $ Name 1))) (iExprVariable $ Name 0)

There is nothing special about the above instance but as it uses a mix of nodes we can use it as a litmus test during development.

Annotating trees with timing

Now to the meat of the problem. I’d like to “retime” the tree by adding delay nodes after operations. In the future I’d like the decision of where to add delays to be more intelligent but for now I am going to follow these simple rules:

  1. Literals and combinations of literals and operations should not have any timing associated with them.
  2. Any variables instantiations should be sampled at time 0.
  3. Any operation that isn’t covered by (1) should occur at the max timing of its children, and should be followed by a delay node.
  4. Any child node to an operation that isn’t the max timing of the children should be delayed with delay nodes until it is equal to the max timing of the children.

I’m going to use Maybe Int to represent the timing of a node, and use the compdata annotation operator :&: to attach some non-functor value to each node of our recursive structure. We don’t need an explicit data type for the annotated tree, but I will derive a Render instance for it.

type Timing = Maybe Int
deriving instance Render (ExprWithDelays :&: Timing)

Since the timing depends solely on the direct children of a node, we can perform all the steps above with a single pass through our structure from leaves to root. Recursion schemes give us a principled way to do this called a catamorphism, which amounts to a fold through our data structure. If I decide later to add different timing strategies it might make sense to reach for something more exotic such as a zygomorphism, but we won’t bother with that for now. If you’re interested in learning more about the different types of common morphisms I recommend reading through a cheat-sheet style reference such as this one.

Catamorphisms work by calling some algebra function on each level of our data structure. The algebra function is of type type Alg f a = f a -> a. The thing to realize here is that f is the current unwrapped node we are working with, and its children are of type a corresponding to results of the algebra on its children. If you haven’t seen this before it might take a moment to click, and I suggest playing around with the compdata examples.

Since I know I want to keep the tree structure but add timing information, the algebra will build the tree back up with a type like Term (ExprWithDelays :&: Timing). We can actually be a bit more general and use typeclass and that applies our algebra to any structure, and then only add instances for relevant types. In the snippet below liftSum instances provide default instances to the constructors of :+: which may exist in our f type.

class Retime f v where
  retime :: Alg f (Term (v :&: Timing))

$(derive [liftSum] [''Retime])

Lastly the instances themselves. From our rules above the cases for ExprLiteral and ExprVariable are straight forward, we can just use the smart constructors that take an additional annotation parameter to attach our timing:

instance (Functor v, Expr :<: v, ExprDelay :<: v) => Retime Expr v where
  retime (ExprLiteral x) = iAExprLiteral Nothing x
  retime (ExprVariable n) = iAExprVariable (Just 0) n

Things get a little more hairy for the other instances but nothing too bad. Accessing the timing annotations in previous results requires a little bit of unwrapping that might be better refactored into standalone functions. A helper function “go” makes adds variable length delay limbs, and most of the rest of the code is dedicated to different Maybe timing conditions.


  retime (ExprUnaryOp op v@(Term (_ :&: Nothing))) =
    iAExprUnaryOp Nothing op v
  retime (ExprUnaryOp op v@(Term (_ :&: t))) =
    iAExprDelay (fmap (+1) t) (iAExprUnaryOp t op v)

  retime (ExprBinaryOp op
            l@(Term (_ :&: lm))
            r@(Term (_ :&: rm))
    ) = case (lm, rm) of
      (Nothing, Nothing) -> iAExprBinaryOp Nothing op l r
      (t, Nothing) -> iAExprDelay (fmap (+1) t) (iAExprBinaryOp t op l r)
      (Nothing, t) -> iAExprDelay (fmap (+1) t) (iAExprBinaryOp t op l r)
      (Just lt, Just rt) ->
        iAExprDelay (Just $ max (lt+1) (rt+1))
          (iAExprBinaryOp (Just $ max lt rt) op
            (go lt (rt - lt) l)
            (go rt (lt - rt) r)
          )
    where
      go t i | i > 0     = go (t+1) (i-1) . iAExprDelay (Just (t+1))
             | otherwise = id

We could also write this a different way. Since our nodes have instances of foldable, we could take the generic approach and fold/map over children to find and resolve timing. Unfortunately building the structure back up requires injections back into the compdata structures that muddies the code:

  -- This replaces the 'retime (ExprUnaryOp...' and 'retime (ExprBinaryOp...' above.
  retime e = case longestDelay of
      Nothing -> Term $ injectA Nothing (inj e)
      -- More generic, but unfortunately less readable:
      Just t -> iAExprDelay (Just $ 1 + getMax t) -- 5. Add an additional delay node
              $ Term                              -- 4. Wrap into the Term datatype
              $ injectA (Just $ getMax t)         -- 3. Add our timing annotation for e
              $ inj                               -- 2. Inject e into the v datatype
              $ fmap (resolveTiming (getMax t))   -- 1. Resolve timing for children
              $ e
    where
      timing (Term(_:&:t)) = t
      longestDelay = foldMap (fmap Max . timing) e
      resolveTiming tmax v@(Term (_ :&: Nothing))  = v
      resolveTiming tmax v@(Term (_ :&: (Just t))) = go t (tmax-t) v
      go t i | i > 0     = go (t+1) (i-1) . iAExprDelay (Just (t+1))
             | otherwise = id

The generic version has the benefit of not requiring updating when modifying Expr, but is a poor option if you are working with people who might not be comfortable with compdata structures.

Property checking

There is a direct opposite to adding delays and timing annotations- removing them. compdata already provides a method to remove annotations via stripA. To remove delay expression nodes we can write another much simpler algebra that removes any ExprDelay nodes and leaves any others untouched:

class RemoveDelays f v where
  removeDelays :: Alg f (Term v)

$(derive [liftSum] [''RemoveDelays])

instance {-# OVERLAPPABLE #-} (f :<: v) => RemoveDelays f v where
  removeDelays = inject
instance RemoveDelays ExprDelay v where
  removeDelays (ExprDelay r) = r

This leads us to a QuickCheck-able property that Exprs are invariant under adding removing timing:

prop_addRemoveTiming :: Term Expr -> Bool
prop_addRemoveTiming x = x == (removeTiming . addTiming $ x)
  where
    addTiming x = cata retime x :: Term (ExprWithDelays :&: Timing)
    removeTiming x = cata removeDelays (stripA x) :: Term Expr

And run it in GHCi:

ghci> quickCheck prop_addRemov
+++ OK, passed 100 tests.

Success! We can continue to add property checks to test the rules above and potentially structure them as catamorphisms, but I’ll stop here for now. Compilable code can be found at https://github.com/blaxill/blog-compdata-trees-and-catamorphisms.