You can find all the code in this post in the following Github repository.

Let’s start with …

Let’s consider the following type of (non-empty) binary trees, implemented in Scala as a common algebraic data type:

sealed abstract class Tree[A] case class Leaf[A](a : A) extends Tree[A] case class Node[A](left: Tree[A], root: A, right: Tree[A]) extends Tree[A]

We want to implement two functions that allow us to get and update the leaves of a given tree. As a first attempt (there will be several attempts more before we reach the solution, be patient!), we may come about with the following signatures:

class Leafs[A]{ def get(tree: Tree[A]): List[A] = ??? def update(tree: Tree[A]): List[A] => Tree[A] = ??? }

The `get`

function bears no problem: there may be one or several leaves in the input tree, and the resulting list can cope with that. The `update`

function, however, while essentially being what we want, poses some problems. This method returns a function which updates the leaves of the tree given a list of new values for those nodes. Ideally, we would expect to receive a list with exactly as many values as leaves are there in the tree. But given this signature, this may not happen at all: we may receive less values or more. In the former case, we are forced to make a choice: either to return the original tree or throwing an exception (abandoning purity). In the latter, it would be fair to return the exceeding values, besides the updated tree. In sum, the following signature seems to be more compliant with the problem at hand:

class Leafs[A]{ def get(tree: Tree[A]): List[A] = ??? def update(tree: Tree[A]): List[A] => Option[(List[A], Tree[A])] = ??? }

Essentially, the `update`

method now returns a stateful computation, i.e. a value of the famous `StateT`

monad. This computation is run by giving an initial list of values, and will finish with a value `None`

(meaning that it couldn’t complete the computation) or `Some(l, t)`

, i.e. the updated tree `t`

and the list of exceeding values `l`

(possibly, empty). We won’t show the implementation of these methods, but you can find it in the repository of this post.

Ok, this is nice, but we are stubborn and keep insisting on finding a way to prevent the user to pass a wrong number of values to the `update`

method. I mean, we want to program the signature in such a way that the compiler throws an error if the programmer tries to call our function with less or more values than needed. Is it that possible?

A possible signature that solves our problem is the following one:

def update(tree: Tree[A]): Vec[A, n_leaves(tree)] => Tree[A]

where `n_leaves: Tree[A] => Integer`

is a function that returns the number of leaves of the specified tree, and the `Vec`

type represents lists of a fixed size. This signature gives the Scala compiler the required information to grant execution of the following call:

scala> update(Node(Leaf(1), 2, Leaf(3)))(Vec(3, 1)) res11: Tree[Int] = Node(Leaf(3), 2, Leaf(1))

and block the following one instead, with a nice compiler error:

scala> update(Node(Leaf(1), 2, Leaf(3)))(Vec(3)) :18: error: type mismatch; found : Vec[Int, 1] required: Vec[Int, 2] update(Node(Leaf(1), 2, Leaf(3)))(Vec(3))

… wouldn’t this be beautiful?

Alas, the above signature is not legal Scala 2.12. The problem is in the `Vec[? , ? : Nat]`

type constructor. As we said, it holds two parameters. There is no problem with the first one: type constructors in Scala do indeed receive types as arguments. Another way of saying this is that types in Scala can be parameterised with respect to types. And yet another way is saying that types in Scala can be made *dependent* on types. But the second parameter of the `Vec`

constructor is not a type, it’s a *value*! And we can’t parameterise types in Scala with respect to values, only to types.

A type whose definition refers to values is called a *dependent type*. Indeed, the type `List[A]`

in Scala also *depends* on something, to wit the type `A`

. So, in a sense, we may rightfully call it a dependent type as well. However, the “dependent” qualifier is conventionally reserved for types that are parameterised with respect to values.

Can’t we solve our problem in Scala, then? Yes, we will see that we can indeed solve this problem in Scala, albeit in a different way. But before delving into the Scala solution, let’s see how we can solve this problem in a language with full-fledged dependent types, in line with the solution sketched at the beginning of this section.

First, we must define the tree data type:

module Trees where data Tree (A : Set) : Set where leaf : A -> Tree A node : Tree A -> A -> Tree A -> Tree A

This a common algebraic data type definition, with constructors `leaf`

and `node`

. The definition is parameterised with respect to `A`

, which is declared to be a regular type, i.e. `Set`

. The resulting type `Tree A`

is also a regular type (i.e. not a type constructor, which would be declared as `Set -> Set`

). Next, we have to define the following function:

open import Data.Nat n_leaves : {A : Set} -> Tree A -> ℕ n_leaves (leaf _) = 1 n_leaves (node l _ r) = n_leaves l + n_leaves r

The `n_leaves`

function returns the number of leaves held by a given tree (as a natural number ℕ declared in the `Data.Nat`

module). The implementation is based on pattern matching, using the same underscore symbol that we use in Scala whenever we are not interested in some value.

Let’s implement now the promised `get`

and `update`

functions, which will be part of a module named `Leafs`

:

module Leafs where open import Data.Vec open Trees get : {A : Set} -> (s : Tree A) -> Vec A (n_leaves s) = ? update : {A : Set} -> (s : Tree A) -> Vec A (n_leaves s) -> Tree A = ?

As you can see, we can now use the `n_leaves s`

value in a type definition! Indeed, the `Vec (A : Set) (n : ℕ)`

type is a truly dependent type. It represents lists of values of a fixed size `n`

. Moreover, the size does not need to be a constant such as 1, 2, 3, etc. It can be the result of a function, as this example shows. The implications of this are huge, as we will soon realise.

Let’s expand the definition of the `get`

function:

get : {A : Set} -> (s : Tree A) -> Vec A (n_leaves s) get (leaf x) = x ∷ [] get (node l _ r) = get l ++ get r

If the tree is a leaf, we just return its value in a vector of length one. Otherwise, we collect recursively the leaves of the left and right subtrees and return their concatenation. What would happen if we implemented the first clause in the pattern matching as `get (leaf x) = []`

(i.e. if we attempted to return the empty vector for a leaf tree)? The compiler would complain with the following error:

0 != 1 of type .Agda.Builtin.Nat.Nat when checking that the expression [] has type Vec .A (n_leaves (leaf x))

This error says that 0, i.e. the length of the empty vector `[]`

, does not equal 1, i.e. the number of leaves of the input tree `leaf x`

. All this while attempting to check that the proposed output `[]`

, whose type is `Vec A 0`

, has the required type `Vec .A (n_leaves (leaf x))`

, i.e. `Vec A 1`

. Similarly, in the second clause, the compiler will care itself to check that `n_leaves l + n_leaves r`

, which is the resulting length of the vector concatenation `get l :: get r`

, equals the value `n_leaves (node l _ r)`

, which according to the definition of the `n_leaves`

function is indeed the case. In sum, we can’t cheat the compiler and return a vector with a number of values different to the number of leaves in the input tree. This property is hardwired in the signature of the function, thanks to the expressiveness of the Agda type system. And to be able to guarantee that, Agda needs to be able to perform computations on values at compile time.

The implementation of the `update`

function is similarly beautiful:

update : {A : Set} -> (s : Tree A) -> Vec A (n_leaves s) -> Tree A update (leaf _) (x ∷ []) = leaf x update (node l x r) v = node updatedL x updatedR where updatedL = update l (take (n_leaves l) v) updatedR = update r (drop (n_leaves l) v)

Note that in the first clause of the pattern matching, we were able to deconstruct the input vector into the shape `x ∷ []`

, without the compiler complaining about missing clauses for the `leaf`

constructor. This is because Agda knows (by evaluating the `n_leaves`

function) that any possible leaf tree has a number of leaves equals to one. In the second clause, the input vector has type `v : Vec A (n_leaves (node l x r))`

, which Agda knows to be `v : Vec A (n_leaves l + n_leaves r)`

by partially evaluating the `n_leaves`

function. This is what makes the subsequent calls to update the left and right subtrees typesafe. Indeed, to update the left subtree `l`

we need a vector with a number of elements equal to its number of leaves `n_leaves l`

. This vector has to be a subvector of the input vector `v`

, which Agda knows to have length `n_leaves l + n_leaves r`

as we mentioned before. So, the expression `take (n_leaves l) v`

will compile without problems. Similarly, Agda knows that the length of the `drop (n_leaves l) v`

vector will be `n_leaves r`

(by checking the definition of the concatenation function `++`

), which is precisely what the `update r`

function needs.

Let’s exercise these definitions in the following module:

module TestLeafs where open import Data.Nat open import Data.Vec open Trees open LeafsAdHoc t1 : Tree ℕ t1 = node (node (leaf 1) 2 (leaf 3)) 4 (leaf 5) l1 : Vec ℕ 3 l1 = Leafs.get t1 t2 : Tree ℕ t2 = Leafs.update t1 (5 ∷ 3 ∷ 1 ∷ []) // CHECK open import Relation.Binary.PropositionalEquality eq1 : l1 ≡ (1 ∷ 3 ∷ 5 ∷ []) eq1 = refl eq2 : t2 ≡ (node (node (leaf 5) 2 (leaf 3)) 4 (leaf 1)) eq2 = refl -- WON'T COMPILE {- Error: 3 != 4 of type ℕ when checking that the expression get t1 has type Vec ℕ 4 l2 : Vec ℕ 4 l2 = Leafs.get t1 -} {- Error: 0 != 2 of type ℕ when checking that the expression [] has type Vec ℕ 2 t3 : Tree ℕ t3 = Leafs.update t1 (5 ∷ []) -}

The `l1`

variable represents the leaves of the sample tree `t1`

, namely values 1, 3 and 5. Accordingly, the type of the variable is `Vec ℕ 3`

. The variable `t2`

is the result of updating the tree with a new collection of leaves. In both cases, we make reference to the functions `get`

and `update`

declared in the module `Leafs`

.

The next lines *prove* that the values of these variables are the expected ones, making use of the equality type constructor `_≡_`

and its `refl`

constructor (note that `_≡_`

is parameterised with respect two values, so it’s a dependent type). The proof is plain `refl`

exivity, i.e. `x ≡ x`

, since `l1`

and `t2`

actually evaluate to the same values.

Note that the fact that this code compiles is enough to show that the tests pass. We don’t need to run anything! On the other hand, Agda allows us to test that our functions work as expected by implementing much more complex proofs for more expressive properties. We will leave that for another post.

Let’s come back to Scala.

We can’t make computations on values in Scala at compile time, but we can do it on types! And this suffices to solve our problem, albeit in a different form to Agda. We will reconcile both approaches in the next section.

Type-level computation in Scala proceeds through the implicits mechanism. But before we can exploit implicits, we first need to re-implement our `Tree`

data type so that we don’t loose the *shapes* of trees:

sealed abstract class Tree[A] case class Leaf[A](value: A) extends Tree[A] case class Node[L <: Tree[A], A, R <: Tree[A]]( left: L, root: A, right: R) extends Tree[A]

This new implementation differs with the previous one in the types of the recursive arguments of the `Node`

constructor. Now, they are generic parameters `L`

and `R`

, declared to be subtypes of `Tree[A]`

, i.e. either leaves or nodes. Essentially, this allows us to preserve the exact type of the tree; what we will call its *shape*. In essence, this is the same trick commonly used to implement heterogeneous lists in Scala (see, e.g. their implementation in the shapeless framework).

For instance, let’s compare both implementations in the REPL, with the old implementation of the `Tree`

data type located in the `P`

module, and the new one in the current scope:

scala> val p_tree = P.Node(P.Node(P.Leaf(1), 2, P.Leaf(3)), 4, P.Leaf(5)) p_tree: P.Node[Int] = ... scala> val tree = Node(Node(Leaf(1), 2, Leaf(3)), 4, Leaf(5)) tree: Node[Node[Leaf[Int], Int, Leaf[Int]], Int, Leaf[Int]] = ...

As we can see, the type of `p_tree`

is simply `Node[Int]`

, whereas the type of `tree`

is much more informative: we don’t only know that it is a node tree; we know that it holds exactly five elements, three of which are leaves. Its shape has not been lost.

We can apply the same trick to the `List`

type, in order to preserve information about the shape of list instances (essentially, how many values it stores). This is the resulting definition:

sealed abstract class List[A] case class Nil[A]() extends List[A] case class Cons[A, T <: List[A]](head: A, tail: T) extends List[A]

Let’s see now how can we exploit these shape-aware, algebraic data types, to support shape-dependent, type-level computations … and finally solve our little problem. Recall the original signatures for the `get/update`

functions, which built upon the common, non-shape aware definitions of the `Tree`

and `List`

data types:

class Leafs[A]{ def get(tree: Tree[A]): List[A] = ??? def update(tree: Tree[A]): List[A] => Tree[A] = ??? }

Now we can explain their limitations in a more precise way. For instance, let’s consider the resulting function of `update`

. The input of this function is declared to be any `List[A]`

, not lists of a particular *shape*. That’s relevant to our problem because we want the compiler to be able to block invocations for trees of an undesired shape, i.e. length. But how can we represent the shape of an algebraic data type in the Scala type system? The answer is *subtyping*, i.e. we can declare the result of that function to be some `L <: List[A]`

, instead of a plain `List[A]`

. There is a one-to-one correspondence between the subtypes of the algebraic data type `List[A]`

and its possible shapes.

Similarly, the input trees of `get`

and `update`

are declared to be any `Tree[A]`

, instead of trees of a particular shape `T <: Tree[A]`

. This is bad, because in that way we won’t be able to determine which is the exact list shape that must be returned for a given tree. Ok, but how can we determine the shape of list corresponding to a given shape of tree? The answer is using *type-level functions* which operates on input/output types that represent shapes.

These shape-dependent functions are declared as traits and defined through the implicits mechanism. For instance, the declaration of the type-level function between trees and lists is as follows:

trait LeafsShape[In <: Tree[A]]{ type Out <: List[A] def get(t: In): Out def update(t: In): Out => In }

The `LeafsShape`

trait is parameterised with respect to any *shape* of tree. Its instance for a particular shape will give us the list shape that we can use to store the current leaves of the tree, or the new values required for those leaves. Moreover, for that particular shape of tree we also obtain its corresponding get and update implementations.

Concerning the implementation of the shape-dependent function `LeafsShape`

, i.e. how do we compute the shape of list corresponding to a given shape of tree, we proceed through implicits defined in its companion object. The following signatures (not for the faint of heart …) suffice:

object LeafsShape{ type Output[T <: Tree[A], _Out] = LeafsShape[T]{ type Out = _Out } implicit def leafCase: Output[Leaf[A], Cons[A, Nil[A]]] = ??? implicit def nodeCase[ L <: Tree[A], R <: Tree[A], LOut <: List[A], ROut <: List[A]](implicit ShapeL: Output[L, LOut], ShapeR: Output[R, ROut], Conc: Concatenate[A, LOut, ROut] ): Output[Node[L, A, R], Conc.Out] = ???

We omit the implementations of the `get`

and `update`

functions to focus on the list shape computation, which is shown through the type alias `Output`

. The first case is easy: the shape of list which we need to hold the leaves of a tree of type `Leaf[A]`

is the one that allows us to store a single element of type `A`

, i.e. `Cons[A, Nil[A]]`

. For arbitrary node trees, the situation is in appearance more complicated, though conceptually simple. Given a tree of shape `Node[L, A, R]`

, we first need to know the list shapes for the left and right subtrees `L`

and `R`

. The implicit arguments `ShapeL`

and `ShapeR`

provide us with the `LOut`

and `ROut`

shapes. The resulting list shape will be precisely their concatenation, which we achieve through an auxiliary type-level function `Concatenate`

(not shown for brevity, but implemented in a similar way). The shape concatenation will be accessible through the `Out`

type member variable of that function. The `Conc.Out`

type is an example of path-dependent type, a truly dependent type since it depends on the value `Conc`

obtained through the implicits mechanism.

We are about to finish. The only thing that is needed is some way to call the `get`

and `update`

member functions of the `LeafsShape`

type-level function, for a given tree value. We achieve that with two auxiliary definitions, located in a definitive `Leafs`

module (where the type-level function and its companion object are also implemented):

class Leafs[A]{ def get[In <: Tree[A]](t : In)(implicit S: LeafsShape[In]): S.Out = S.get(t) def update[In <: Tree[A]](t : In)(implicit S: LeafsShape[In]): S.Out => In = S.update(t) trait LeafsShape[In <: Tree[A]]{ ... } object LeafsShape{ ... } }

The auxiliary functions `get`

and `update`

are the typesafe counterparts of the original signatures. The first difference that we may emphasise is that the type of input trees is not a plain, uninformative `Tree[A]`

, but a particular shape of tree `In`

. The compiler can then use this shape as input to the type-level function `LeafsShape`

, to compute the shape of the resulting list `S.Out`

. The output of these functions is thus declared as a path-dependent type. Last, note that the implementation of these functions is wholly delegated to the corresponding implementations of the inferred type-level function.

Let’s see how this works in the following REPL session:

scala> val tree = Node(Node(Leaf(1), 2, Leaf(3)), 4, Leaf(5)) tree: Node[Node[Leaf[Int], Int, Leaf[Int]], Int, Leaf[Int]] = ... scala> get(tree) res2: Cons[Int, Cons[Int, Cons[Int, Nil[Int]]]] = Cons(1,Cons(3,Cons(5,Nil()))) scala> update(tree).apply(Cons(5, Cons(3, Cons(1, Nil[Int]())))) res1: Node[Node[Leaf[Int], Int, Leaf[Int]], Int, Leaf[Int]] = Node(Node(Leaf(5), 2, Leaf(3)), 4, Leaf(1)) scala> update(tree).apply(Cons(5, Nil[Int]())) :22: error: type mismatch; found : Nil[Int] required: Cons[Int, Cons[Int, Nil[Int]]] update(tree).apply(Cons(5, Nil[Int]()) ^

As expected, when we pass lists of the right shape, everything works. On the contrary, as shown in the last example, if we pass a list of the wrong size, the compiler will complain. In particular, the error message tells us that it found a list of type `Nil[Int]`

where it expected a list of size two. This is because `update(tree)`

returns a list of shape three, and we only pass a list of size one. This is exactly the same behaviour that we got with the Agda implementation.

The Scala and Agda implementations seem very different. In Scala, we exploit the expressiveness of its type system to preserve the shape of algebraic data type values, and perform type-level, shape-dependent computations at compile time. In Agda, we exploit its capability to declare full-fledged dependent types, and perform value-level computations at compile time.

Nonetheless, let’s recall the signatures of both implementations and try to reconcile their differences:

-- AGDA VERSION module Leafs where open import Data.Vec open Trees get : {A : Set} -> (s : Tree A) -> Vec A (n_leaves s) = ? update : {A : Set} -> (s : Tree A) -> Vec A (n_leaves s) -> Tree A = ?

// SCALA VERSION class Leafs[A]{ def get[In <: Tree[A]](t : In)(implicit S: LeafsShape[In]): S.Out = S.get(t) def update[In <: Tree[A]](t : In)(implicit S: LeafsShape[In]): S.Out => In = S.update(t) trait LeafsShape[In <: Tree[A]]{ type Out <: List[A] def get(t: In): Out def update(t: In): Out => In } object LeafsShape{ ... } }

In a sense, the Scala signature is simpler: there is no need to use a different type `Vec (A : Set) n : Nat`

. The very same algebraic data type `List[A]`

(albeit implemented in a shape-aware fashion), and subtyping suffice for representing shapes. In Agda, the new vector type is introduced precisely to represent the shapes of lists, which are in one to one correspondence with the natural numbers.

The `#length`

function is then used to compute the required shape for a given tree. In Scala, there is no particular need for that, since the shape is computed along the implementation of the `get`

and `update`

functions in the type-level function `LeafsShape`

.

The downside of the Scala implementation is, evidently, its verbosity and the amount of techniques and tricks involved: path-dependent types, traits, subtyping, implicits, auxiliary functions, … This is a recognised problem which is being tackled for the future Scala 3.0 version.

We may have mimicked the Agda implementation style in Scala. In the `shapeless`

framework, for instance, we have available the `Sized`

and `Nat`

types to represent lists of a fixed size (see the implementation here), and we may even use literal types to overcome the limitation of using values in type declarations. Alternatively, we proposed an implementation fully based on shape-aware algebraic data types. This version is in our opinion more idiomatic to solve our particular problem in Scala. It also allows us to grasp the idiosyncrasy of Scala with respect to competing approaches like the one proposed in Agda. In this regard, we found the notion of *shape* to be extremely useful.

In next posts we will likely go on exploring Agda in one of its most characteristic applications: certified programming. For instance, we may generalise the example shown in this post and talk about *traversals* (a kind of optic, like lenses) and its laws. One of these laws, applied to our example, tells us that if you update the leaves of the tree with its current leaf values, you will obtain the same tree. Using Agda, we can state that law and *prove* that our implementation satisfies it. No need to enumerate test cases, or empirically test the given property (e.g., as in Scalacheck). Till the next post!

We’re really excited about it, because we’ll be introducing so-called

Optic algebras emerge to overcome existing limitations on the standing techniques to handle the data layer of real-world applications. On the one hand, *optics* support rich patterns to manipulate data, but they’re restricted to immutable data structures. On the other hand, algebraic abstractions such as *MonadState* provide the means to work with general settings (relational databases, microservices, etc.), but their associated patterns are really poor. Optic algebras attempt to supply rich patterns while remaining general, combining therefore the best of both worlds.

We’ll take this opportunity to premiere our new Scala library, which we’ve affectionately named *Stateless*. This library exploits the notion of optic algebra and aims at making it easier to deal with the state of your applications. In this sense, you could implement the data layer of your application and its business logic once and for all, using the domain specific language provided by Stateless, and later interpret it into particular state-based technologies. This library thus complements other open source efforts of Habla Computing (our functional architecture studio) such as puretest, to contribute to the functional ecosystem of Scala.

We look forward to seeing you in #scalax!

]]>`Cartesian`

, `Monoidal`

, etc.) and we’ll provide their corresponding diagrams, to make them more approachable.
]]>This program above has several advantages over an impure one, given that it completely separates the business logic (the WHAT) from the interpretation (the HOW). This gives us full room of possibilities, since we can change the whole deployment infrastructure without having to change the logic in any way. In other words, business logic changes affect only business logic code and infrastructure changes affect only interpreters (provided that neither of these changes affect the DSL, of course). Some changes of interpretation could be, for instance, running the program using `Future`

s in an asynchronous way, or running it as a pure state transformation for testing purposes using `State`

.

Now, you might be wondering, is OOP capable of achieving this level of ** declarativeness**? In this post, we will see that we can indeed do purely functional programming in a purely object-oriented style. However, in order to do so, the conventional techniques that we normally employ when doing OOP (plain abstract interfaces) won’t suffice. What we actually need are more powerful techniques for building

In OOP, the most common way to achieve declarativeness is by using plain abstract interfaces, of course. In a similar way to the GADT approach, we can acknowledge four parts of this design pattern:

- Interface/API
- Method/Program over that interface
- Concrete instances
- Composition

Here there is a very illustrative diagram of this approach:

However, this is just one step towards declarativeness; it separates a little bit WHAT and HOW, since `IO`

is an abstract interface, but we still have a very limited range of possible HOWs. This is quite easy to prove just by giving a couple of interpretations we **can not** implement. These are, for instance, asynchronous and pure state transformations. In the former case, we can’t simply implement the `IO`

signature in an asynchronous way, since this signature forces us to return *plain* values, i.e. a value of type `String`

in the `read`

case, and a value of type `Unit`

in the `write`

case. If we attempt to implement this API in an asynchronous way, we will eventually get a `Future[String]`

value, and we will have to convert this promise to a plain `String`

by blocking the thread and waiting for the asynchronous computation to complete, thus rendering the interpretation absolutely synchronous.

object asyncInstance extends IO { def write(msg: String): Unit = Await.result(/* My future computation */, 2 seconds) def read: String = /* Idem */ }

Similarly, an state-based interpretation won’t be possible. In sum, if we want an asynchronous or a pure state transformer behaviour for our programs, we would have to change the original interface to reflect those changes and come up with two new APIs:

trait IO { // Async def write(msg: String): Future[Unit] def read(): Future[String] } trait IO { // Pure state transformations def write(msg: String): IOState => (IOState, Unit) def read(): IOState => (IOState, String) }

This is clearly not desirable, since these changes in API will force us to rewrite all of our business logic that rests upon the original `IO`

API. Let’s go ahead and start improving our OOP interfaces towards true declarativeness. As we’ve seen in this pattern, we can distinguish between the abstract world (interface and interface-dependent method) and the concrete world (interface instance and composition).

We may notice that there are not many differences among the three interfaces we’ve shown so far. In fact, the only differences are related to the return type *embelishment* in each case:

We can factor out these differences and generalize a common solution for all of them; we just need to write our interface in such a way that the instructions (methods) don’t return a plain value, but a value wrapped in a generic type constructor, the so-called *embelishment;* from now on we will also call those embelishments ** programs**, as they can be considered computations that will eventually return a result value (once the asynchronous computation completes, or when we enact the state transformation).

trait IO[P[_]] { def read: P[String] def write(msg: String): P[Unit] } // Console type Id[A] = A type SynchIO = IO[Id] // Async type AsyncIO = IO[Future] // Pure state transformations type State[A] = IOState => (IOState, A) type StateIO = IO[State]

Wow! our new *interface* is a *generic* interface, and, more specifically, a ** type class** that solves our declarativeness problem: we can now create interpreters (instances) for both asynchronous and state transformers computations, and for any other program you may think of.

We call this type of class-based APIs ** functional APIs**, due to their ability to totally decouple business logic from interpretation. With our traditional interfaces we still had our business logic contaminated with HOW concepts, specifically with the limitation of running always in

`Id[_]`

. Now, we are truly free.Ain’t it easy? Let’s see what we have so far. We have a type class that models IO languages. Those languages consists on two instructions *read* and *write* that returns plain abstract programs. What can we do with this type class already?

def hello[P[_]](IO: IO[P]): P[Unit] = IO.write("Hello, world!") def sayWhat[P[_]](IO: IO[P]): P[String] = IO.read

Not very impressive, we don’t have any problem to build simple programs, what about composition?

def helloSayWhat[P[_]](IO: IO[P]): P[String] = { IO.write("Hello, say something:") IO.read() } // This doesn't work as expected

Houston, we have a problem! The program above just reads the input but it’s not writing anything, the first instruction is just a pure statement in the middle of our program, hence it’s doing nothing. We are missing some mechanism to combine our programs in an imperative way. Luckily for us, that’s exactly what monads do, in fact monads are just another ** Functional API**:

trait Monad[P[_]] { def flatMap[A, B](pa: P[A])(f: A => P[B]): P[B] def pure[A](a: A): P[A] }

Well, you won’t believe it but we can already *define* every single program we had in our previous post. Emphasis in the word *define*, as we can just do that: *define *or *declare* in a pure way all of our programs; but we’re still in the *abstract world*, in our safe space, where everything is wonderful, modular and comfy.

def helloSayWhat[P[_]](M: Monad[P], IO: IO[P]): P[String] = M.flatMap(IO.write("Hello, say something:")){ _ => IO.read } def echo[P[_]](M: Monad[P], IO: IO[P]): P[Unit] = M.flatMap(IO.read){ msg => IO.write(msg) } def echo2[P[_]](M: Monad[P], IO: IO[P]): P[String] = M.flatMap(IO.read){ msg => M.flatMap(IO.write(msg)){ _ => M.pure(msg) } }

Ok, the previous code is pretty modular but isn’t very sweet. But with a little help from our friends (namely, context bounds, for-comprehensions, helper methods and infix operators), we can get closer to the syntactic niceties of the non-declarative implementation:

def helloSayWhat[P[_]: Monad: IO]: P[String] = write("Hello, say something:") >> read def echo[P[_]: Monad: IO]: P[Unit] = read >>= write[P] def echo2[P[_]: Monad: IO]: P[String] = for { msg <- read _ <- write(msg) } yield msg

You can get the details of this transformation in the accompanying gist of this post.

As we said, these are just pure program definitions, free of interpretation. Time to go to real world! Luckily for us, interpreters of these programs are just instances of our type class. Moreover, our *console* interpreter will look almost the same as in the OOP version, we just need to specify the type of our programs to be `Id[_]`

(in the OOP approach this was set implicitly):

// Remember, `Id[A]` is just the same as `A` implicit object ioTerminal extends IO[Id] { def print(msg: String) = println(msg) def read() = readLine } implicit object idMonad extends Monad[Id] { def flatMap[A, B](pa: Id[A])(f: A => Id[B]): Id[B] = f(pa) def pure[A](a: A): Id[A] = a } def helloConsole(): Unit = hello[Id](ioTerminal) def sayWhatConsole(): String = sayWhat(ioTerminal) def helloSayWhatConsole() = helloSayWhat(idMonad, ioTerminal) def echoConsole() = echo[Id] def echo2Console() = echo2[Id]

So now, we can start talking about the type class design pattern. In the same way we did with the plan abstract interface design pattern, here it is the diagram of this methodology:

Fine, we’ve seen two ways of defining pure, declarative programs (GADTs and Functional APIs), and another one that unsuccessfully aims to do so (plain OOP abstract interfaces), what are the differences? which one is better? Well, let’s answer the first question for now using the following table:

As you can see, the GADT style for doing functional programming (FP) favours data types (`IOEffect`

and `Free`

), whereas FP in a OO style favours APIs (`IO`

and `Monad`

); declarative functions in the GADT style return programs written in our DSL (`IOProgram`

), whereas declarative functions in FP (OO Style) are ad-hoc polymorphic functions; concerning interpretations, natural transformations used in the GADT style correspond simply to instances of APIs in OO-based FP; last, running our programs in the GADT style using a given interpreter, just means plain old dependency injection in FP OO. As for the conventional OOP approach, you can just see how it can be considered an instance of FP OO for the `Id`

interpretation.

About the question of which alternative is better, GADTs or Functional APIs, there’s not an easy answer, but we can give some tips:

**Cleaner:**This approach implies much less boilerplate.**Simpler:**It’s easier to perform and it should be pretty familiar to any OOP programmer (no need to talk about GADTs or natural transformations).**Performance:**We don’t have to create lots of intermediate objects like the ADT version does.**Flexible:**We can go from Functional APIs to GADTs at any time, just giving an instance of the type class for the ADT-based program (e.g.,`object toADT extends IO[IOProgram]]`

).

**More control:**In general, ADTs allows for more control over our programs, due to the fact that we have the program represented as a value that we can inspect, modify, refactor, etc.**Reification:**if you need somehow to pass around your programs, or read programs from a file, then you need to represent programs as values, and for that purpose ADTs come in very handy.**Modular interpreters:**Arguably, we can write interpreters in a more modular fashion when working with GADTs, as, for instance, with the`Eff`

monad.

We have seen how we can do purely functional programming in an object-oriented fashion using so-called functional APIs, i.e. using type classes instead of plain abstract interfaces. This little change allowed us to widen the type of interpretations that our OO APIs can handle, and write programs in a purely declarative fashion. And, significantly, all of this was achieved while working in the realm of object-oriented programming! So, this style of doing FP, which is also known as MTL, tagless final and related to object-algebras, is more closely aligned with OO programmers, and don’t require knowledge of alien abstractions to the OO world such as GADTs and natural transformations. But we just scratched the surface, as this is a very large subject to tackle in one post. Some of the topics we may see in the future are:

**Modular interpreters:**How to seamlessly compose interpreters using Functional APIs is another large issue which is currently under investigation. A recent library that aims at this goal is mainecoon.**Church encodings:**In the GADT approach, declarative functions return*programs*that will eventually be interpreted, but with Functional APIs, we don’t see any such program value. In our next posts, we will see how the Church encoding allows us to reconcile this two different ways of doing FP.

Last, let us recommend you this presentation where we talk about the issues of this post! All roads lead … to lambda world. Also, you can find the code of this post here.

See ya!

]]>// IO DSL sealed trait IOProgram[A] case class Single[A](e: IOProgram.Effect[A]) extends IOProgram[A] case class Sequence[A, B](p1: IOProgram[A], p2: A => IOProgram[B]) extends IOProgram[B] case class Value[A](a: A) extends IOProgram[A] object IOProgram{ sealed trait Effect[A] case class Write(s: String) extends Effect[Unit] case object Read extends Effect[String] } // Sample IO program def echo(): IOProgram[String] = Sequence(Single(Read()), (msg: String) => Sequence(Write(msg), (_ : Unit) => Value(msg)))

However, while this design is essentially correct from the point of view of the functional requirements of our little application, and from the point of view of illustrating the essence of functional programming, there are two major flaws concerning two important non-functional guarantees: readability and modularity. Let’s start from the first one!

*Note: you can find the code for this post in this repo. *

What’s the problem with the little `echo`

function we came up with? Well, this function being pure has an essential advantage: it simply declares *what* has to be done, and the task of actually executing those programs in any way we want is delegated to another part of the application – the interpreter. Thus, we could run our `echo()`

IO program using the `println`

and `readLine`

methods of the `Console`

; or using an asynchronous library using `Future`

values; or test it without the need of mocking libraries with the help of custom state transformers in a type-safe way. Great, great, great! But … who would ever want to write our pure functions using that syntax? We have to admit that the readability of our little program is poor … to say the least. Let’s fix it!

We start by adding some lifting methods that allow us to use *IO instructions* as if they were *programs *already:

object IOProgram { object Syntax{ val read(): IOProgram[String] = Single(Read) def write(msg: String): IOProgram[Unit] = Single(Write(msg)) } }

Next, let’s introduce some smart constructors for sequencing programs. We will named them `flatMap`

and `map`

— for reasons that will become clear very soon. As you can see in the following implementation, `flatMap`

simply allow us to write sequential programs using an infix notation; and `map`

allows us to write a special type of sequential program: one which runs some program, transforms its result using a given function, and then simply returns that transformed output.

sealed trait IOProgram[A]{ def flatMap[B](f: A => IOProgram[B]): IOProgram[B] = Sequence(this, f) def map[B](f: A => B): IOProgram[B] = flatMap(f andThen Value.apply) }

Using all these smart constructors we can already write our program in a more concise style:

import IOProgram.Syntax._ def echo: IOProgram[String] = read() flatMap { msg => write(msg) map { _ => msg } }

We may agree that the above version using smart constructors represents an improvement, but, admittedly, it’s far from the conciseness and readability of the initial impure version:

def echo(): String = { val msg: String = readLine println(msg) msg }

For one thing at least: in case that our program consists of a long sequence of multiple subprograms, we will be forced to write a long sequence of nested indented `flatMap`

s. But we can avoid this already using so-called *for-comprehensions*! This is a Scala feature which parallels Haskell’s *do* notation and F#’s *computation expressions*. In all of these cases, the purpose is being able to write sequential programs more easily. Our little example can be written now as follows:

import IOProgram.Syntax._ def echo(): IOProgram[String] = for{ msg <- read() _ <- write(msg) } yield msg

For-comprehensions are desugared by the Scala compiler into a sequence of `flatMap`

s and a last `map`

expression. So, the above program and the `flatMap`

-based program written in the last section are essentially identical.

Let’s deal now with the second of our problems: the one concerning modularity. What’s the problem with the little DSL to write IO programs we came up with? Basically, the problem is that, approximately, half of this data type is not related to input-output at all. Indeed, if we were to write a different DSL to write imperative programs dealing with file system effects (e.g. reading the content from some file, renaming it, etc.), we would almost write line by line half of its definition:

sealed trait FileSystemProgram[A] case class Single[A](e: FileSystemProgram.Effect[A]) extends FileSystemProgram[A] case class Sequence[A, B](p1: FileSystemProgram[A], p2: A => FileSystemProgram[B]) extends FileSystemProgram[B] case class Value[A](a: A) extends FileSystemProgram[A] object FileSystemProgram{ sealed abstract class Effect[_] case class ReadFile(path: String) extends Effect[String] case class DeleteFile(path: String) extends Effect[Unit] case class WriteFile(path: String, content: String) extends Effect[Unit] }

The only remarkable change is related to the kinds of effects we are dealing with now: file system effects instead of IO effects. The definition of the DSL itself simply varies in the reference to the new kind of effect. This amount of redundancy is a clear signal of a lack of modularity. What we need is a generic data type that accounts for the common imperative features of both DSLs. We can try it as follows:

sealed trait ImperativeProgram[Effect[_],A]{ def flatMap[B](f: A => ImperativeProgram[Effect,B]) = Sequence(this, f) def map[B](f: A => B) = flatMap(f andThen Value.apply) } case class Single[Effect[_],A](e: Effect[A]) extends ImperativeProgram[Effect,A] case class Sequence[Effect[_],A, B]( p1: ImperativeProgram[Effect,A], p2: A => ImperativeProgram[Effect,B]) extends ImperativeProgram[Effect,B] case class Value[Effect[_],A](a: A) extends ImperativeProgram[Effect,A]

Note how the `Single`

variant of the DSL now refers to a (type constructor) parameter `Effect[_]`

. We can now reuse the `ImperativeProgram`

generic DSL in a modular definition of our DSLs for IO and file system effects:

type IOProgram[A] = ImperativeProgram[IOProgram.Effect, A] type FileSystemProgram[A] = ImperativeProgram[FileSystemProgram.Effect, A]

This `ImperativeProgram`

generic DSL seems pretty powerful: indeed, it encodes the essence of imperative DSLs, and it is actually commonly known through a much more popular name: *Free Monad*! The definitions of `Free`

that you will find in professional libraries such as cats, scalaz or eff are not quite the same as the one obtained in this post, which is quite inefficient both in time and space (not to mention further modularity problems when combining different types of effects); but, the essence of free monads, namely, being able to define imperative programs given *any* type of effects represented by some type constructor is there. This substantially reduces the effort of defining an imperative DSL: first, program definition will collapse into a single type alias; second, we will get the `flatMap`

and `map`

operators for free; and, similarly, although not shown in this post, we will also be able to simplify the definition of monadic interpreters (those that translate the given free program into a specific monadic data type, such as a state transformation, asynchronous computation, etc.), amongst many other goodies.

We may say that the essence of functional programming is *modularity*. Indeed, the defining feature of functional programming, namely *pure functions*, is an application of this design principle: they let us compose our application out of two kinds of modules: pure functions themselves that declare what has to be done, and interpreters that specify a particular way of doing it. In particular, interpreters may behave as translators, so that the resulting interpretations are programs written in a lower-level DSL, that also need to be interpreted. Eventually, we will reach the “bare metal” and the interpreters will actually bring the effects into the real world (i.e. something will be written in the screen, a file will be read, a web service will be called, etc.).

But besides pure functions, functional programming is full of many additional modularity techniques: parametric polymorphism, type classes, higher-order functions, lazy evaluation, datatype generics, etc. All these techniques, which were first conceived in the functional programming community, basically aim at allowing us to write programs with extra levels of modularity. We saw an example in this post: instead of defining imperative DSLs for implementing Input/Output and File System programs in a monolithic way, we were able to abstract away their differences and package their common part in a super reusable definition: namely, the generic imperative DSL represented by the Free monad. How did we do that? Basically, using parametric polymorphism (higher-kinds generics, in particular), and generalised algebraic data types (GADTs). But functional programming is so rich in abstractions and modularity techniques, that we may have even achieved a similar modular result using type classes instead of GADTs (in a style known as *finally tagless*). And this is actually what we will see in our next post. Stay tuned!

Undoubtedly, one of the most widespread algebraic structures in the functional programming community is *monoid*. Despite its simplicity, it turns out to be very powerful. Typically, in Scala type class libraries, monoid is represented as follows:

trait OMonoid[A] { def mzero(): A def mappend(a1: A, a2: A): A }

This type class is what is known as the *object algebra interface*, an interface of an abstract factory to create expressions. It contains two methods: `mzero`

and `mappend`

which correspond with the two operations that describe this particular algebra. Once we have created the algebra interface, we could provide instances for it, that are also known as *object algebras*. A common monoid instance is *sum*:

val sumOMonoid: OMonoid[Int] = new OMonoid[Int] { def mzero(): Int = 0 def mappend(a1: Int, a2: Int): Int = a1 + a2 }

Once we have shown object algebra fundamentals, it’s time to focus on F-algebras. This is a really simple abstraction that consists of a *Functor* `F[_]`

, a *carrier* type `A`

and an algebra *structure* (the function itself):

type FAlgebra[F[_], A] = F[A] => A

At first glance, this looks very different from the object algebra approach for monoids. However, as we will see, the translation is completely natural. Indeed, this representation just packs all the algebra operations into a unique function. Thereby, the major challenge here is to identify the corresponding functor for monoids, which is an Algebraic Data Type with a representative for every operation conforming the algebra. We refer to it as the algebra signature:

sealed trait Σ[A] case class MZero[A]() extends Σ[A] case class MAppend[A](a1: A, a2: A) extends Σ[A]

Once the functor is defined, we can modularize Monoid as an F-algebra:

type FMonoid[A] = FAlgebra[Σ, A]

Finally, we could provide a *sum* instance for the brand new monoid representation, as we did with the previous approach:

val sumFMonoid: FMonoid[Int] = { case Mzero() => 0 case Mappend(a1, a2) => a1 + a2 }

We claim that `sumFMonoid`

is isomorphic to `sumOMonoid`

. In order to provide such an evidence, we show the isomorphism between `OMonoid`

and `FMonoid`

:

val monoidIso = new (OMonoid <~> FMonoid) { val to = new (OMonoid ~> FMonoid) { def apply[A](omonoid: OMonoid[A]) = { case Mzero() => omonoid.mzero case Mappend(a1, a2) => omonoid.mappend(a1, a2) } } val from = new (FMonoid ~> Monoid) { def apply[A](fmonoid: FAlgebra[Σ, A]) = new FMonoid[A] { def mzero = fmonoid(Mzero()) def mappend(a1: A, a2: A) = fmonoid(Mappend(a1, a2)) } } }

*(*) Notice that we have ignored the monoid laws along the article for simplicity, but keep in mind that they constitute a fundamental part of every algebra.*

Given this situation, the question we should be asking is: *“What is the best algebra representation for us?”* Sadly, there’s no clear answer to this. On the one hand, there is F-algebra. Undoubtedly, this representation is more modular. In fact, it is used in projects such as Matryoshka, a library of recursion-schemes that is able to generate fixed points for any `Functor`

, or define a generic catamorphism (or `fold`

) *once and for all*, which works for any F-algebra. On the other hand, there is the object algebra representation, closer to the widespread *programming interfaces*, that we can find in libraries such as scalaz or cats. Although not as modular as F-algebras, this representation is powerful enough to interpret algebra expressions with little effort. See this paper on shallow embedding to get a better intuition on that. Therefore, both representations do appear in prominent libraries of the functional programming community. Wouldn’t it be nice to have them coexisting?

`@algebra`

to Provide ConversionsAs we have seen in the previous section, turning object algebras into F-algebras (and viceversa) is straightforward. Besides, we noticed that both algebra representations are used in everyday programming. For all these reasons, we decided to code an experimental macro (`@algebra`

) to make both representations live together. The macro annotation can be applied to an object algebra Interface:

@algebra trait OMonoid[A] { def mzero(): A def mappend(a1: A, a2: A): A }

This annotation removes boilerplate by automatically generating some F-algebra encodings. They should enable us to translate object algebras wherever an F-algebra is required. To check that behaviour, we’re going to invoke a Matryoshka *catamorphism* (`cata`

) that requires an F-algebra as input parameter, but we’ll be implementing object algebras instead. Besides, we’ll be using the tiny language (num literals and multiplications) that is used in Matryoshka’s introduction, so we recommend the reader to glance at it before moving ahead. Then, you should be able to appreciate that `Expr`

leads to the following object algebra interface:

@algebra trait ExprAlg[A] { def num(value: Long): A def mul(l: A, r: A): A }

First of all, our macro has to generate the corresponding `Expr`

signature. So, our auto generated companion for `ExprAlg`

will contain:

sealed abstract class Σ[_] case class Num[A](value: Long) extends Σ[A] case class Mul[A](l: A, r: A) extends Σ[A]

In this section from Matryoshka’s introduction, we see that it’s required to provide a functor for `Expr`

prior to apply a `cata`

. Our macro is able to derive the Functor instance for `Σ`

, so we don’t have to worry about that. The document shows also an `eval`

F-algebra, that we can translate easily to an object algebra:

implicit def eval = new Expr[Long] { def num(value: Long) = value def mul(l: Long, r: Long) = l * r }

Notice that we marked it as `implicit`

, because it will be necessary for the next task, which is invoking the catamorphism over an expression. Firstly, we need to declare the expression to be folded, I mean, evaluated. We can copy `someExpr`

as is, and it will compile smoothly, since the `Mul`

and `Num`

case classes are generated by the macro as well:

def someExpr[T](implicit T: Corecursive.Aux[T, Σ]): T = Mul(Num[T](2).embed, Mul(Num[T](3).embed, Num[T](4).embed).embed).embed

Finally, we can invoke the `cata`

. As we noted previously, it requires an F-algebra as input. Thereby, we use the `FAlgebra`

summoner, generated by the macro, that detects the implicit `eval`

and turns it into a compatible F-algebra to feed the function.

someExpr[Mu[Σ]].cata(FAlgebra[Long]) // ⇒ 24

To sum up, we applied our macro annotation to `ExprAlg`

to generate some utilities to deal with F-algebras. Then, we defined our `eval`

as an object algebra. As the generated encodings knew how to turn it into a F-algebra, we could invoke Matryoshka’s `cata`

with this algebra safely. Thus, we reach our objective of making both representations coexist nicely.

Today, we have seen `OMonoid[A]`

and `ExprAlg[A]`

as algebra examples, both demanding a concrete type parameter. However, there are algebras that are parametrized by a type constructor. Take `Monad[F[_]]`

as an example. In this particular situation, we can’t generate isomorphisms with F-algebras as we know them. Instead, we have to deal with F-algebras for Higher Order Functors. Our macro `@algebra`

is able to detect GADTs and generate the corresponding encodings. This is still very experimental, but you can find an example here.

By now, we have placed `@algebra`

in *azucar* (spanish word for “sugar”), a library where we plan to deploy more utilities to deal with (co)algebras. If you have some feedback or suggestion to improve it, we’d be very glad to hear from you. Anyway, we hope you’ve enjoyed reading!

In the following paragraphs, you’ll find a series of impure IO programs that represent purification challenges. Compare to the simple “Hello, world!” program, these new programs feature additional IO instructions, and an increasingly complex control flow structure. These extra levels of complexity will force us to enrich the simple `IOProgram`

DSL and interpreter that we created to purify the simple “Hello, world!” program. For easy of reference, we reproduce that initial purification bellow:

object HelloWorld{ /* Impure program */ def helloWorld: String = println("Hello, world!") /* Functional solution */ object Fun { // Language type IOProgram = Print case class Print(msg: String) // Pure function def pureHello(): IOProgram = Print("Hello, world!") // Interpreter def run(program: IOProgram): Unit = program match { case Print(msg) => println(msg) } // Impure program (modularised) def hello() = run(pureHello()) } }

Our first challenge consists in purifying the following impure program:

def sayWhat: String = readLine

Similarly to the “Hello, world!” program, the “sayWhat” program consists of a single IO instruction. In this case, when the program is run it will immediately block until we type something in the console. Then, it will return the string typed, rather than `Unit`

:

scala> SayWhat.sayWhat (type "something") res0: String = "something"

In order to purify any program we have to return pure values that represent a description of the logic we want to accomplish. In order to describe this logic, we use the IOProgram DSL, but the current version of this DSL does only offer a `Write`

instruction, so we have to extend it with a new `Read`

command:

type IOProgram[A] = IOEffect[A] sealed trait IOEffect[A] case class Write(msg: String) extends IOEffect[Unit] case object Read extends IOEffect[String]

There are several things going on here:

- We chose to use an ADT (Algebraic Data Type) to represent our instructions. We created an
*effect Language*to perform IO operations, composed by two instructions, one to read from and the other to write to the console. And our programs consist of either one of these instructions. - The new
`Read`

instruction is a`case object`

because it is a singleton instance; there can only exist one and only one instance of`Read`

as it has no arguments. - Another thing to point out is that we parameterized our
`IOEffect`

and`IOProgram`

ADTs. We did that because we need to store the return type of our instructions somewhere in order to be able to implement the interpreter. So in this case we use a*phantom type*to carry that information over. Thus, the`IOEffect`

algebraic data type is what is known as a Generalised Algebraic Data type (GADT).

We got it. Now we can express the impure “sayWhat” program in a pure fashion as follows:

def pureSayWhat: IOProgram[String] = Read

As simple as that. But things become more complicated when we try to update our interpreter:

def run(program: IOProgram): ??? = ...

Now our programs are parameterized so we need to change the signature a little bit; remember that the return type is stored in the program type parameter. We can use good old pattern matching to know which instruction we are dealing with (Scala’s support for pattern matching GADTs suffices in this case):

def run[A](program: IOProgram[A]): A = program match { case Write(msg) => println(msg) case Read => readLine }

The only thing left to do is reimplementing in a modular fashion our equivalent impure function. I’ll leave the complete example below:

object SayWhat { /* Impure program */ def sayWhat: String = readLine /* Functional solution */ object Fun { // Language type IOProgram[A] = IOEffect[A] sealed trait IOEffect[A] case class Write(msg: String) extends IOEffect[Unit] case object Read extends IOEffect[String] // Pure Program def pureSayWhat: IOProgram[String] = Read // Interpreter def run[A](program: IOProgram[A]): A = program match { case Write(msg) => println(msg) case Read => readLine } // Composition def sayWhat: String = run(pureSayWhat) } }

We’ll start now building programs with more than one instruction. In this case we are going to print something to the console and then read the user’s input.

def helloSayWhat: String = { println("Hello, say something:") readLine }

As you can see, this is a common *imperative* program which can be read out aloud as follows: “first, do this; next, do this”.

So far, our program definition has been just a type alias of a single instruction, but now we want our programs to be able to represent two-instructions programs as well. It turns out our program definition must be also an ADT:

sealed trait IOProgram[A] case class Single[A](e: IOEffect[A]) extends IOProgram[A] case class Sequence[A, B](e1: IOProgram[A], e2: IOProgram[B]) extends IOProgram[B] def pureHelloSayWhat: IOProgram[String] = Sequence( Single(Write("Hello, say something:")), Single(Read))

As you can see, our programs can now be made up of just a `Single`

instruction or a `Sequence`

of two programs.

We must now change the interpreter accordingly. In particular, we need two interpreters, one for programs and one for effects:

def run[A](program: IOProgram[A]): A = program match { case Single(e) => runEffect(e) case Sequence(p1, p2) => runProgram(p1) ; runProgram(p2) } def runEffect[A](effect: IOEffect[A]): A = effect match { case Write(msg) => println(msg) case Read => readLine }

The only thing left is to rewrite the impure program in a modular fashion:

def sayWhat: String = run(pureHelloSayWhat)

In our next program we’ll complicate the control flow a little bit.

def echo: Unit = { val read: String = readLine println(read) }

Note that the `println`

instruction is writing the result of the `read`

operation. This is a behaviour we can’t describe with our program representation yet. The problem is that the `Sequence`

case doesn’t allow us to use the result of the first program. We thus need somehow to represent context-dependent programs, i.e. programs that depend on the results of previous ones. Let’s fix that.

sealed trait IOProgram[A] case class Single[A](e: IOEffect[A]) extends IOProgram[A] case class Sequence[A, B](e1: IOProgram[A], e2: A => IOProgram[B]) extends IOProgram[B] def pureEcho: IOProgram[Unit] = Sequence( Single(Read), read => Single(Write(read)) )

As simple as that, the new version of `Sequence`

carries as its second parameter a program that is allowed to depend on a value of type `A`

, i.e. the type of values returned when the first program is interpreted. Of course, the intention is that the interpreter will apply this function to that precise value, as will be shown in the next section. By the way, does the signature of the new version of `Sequence`

ring a bell?

As commented previously, the new version of the interpreter will simply need to modify the way in which sequenced programs are executed:

def runProgram[A](program: IOProgram[A]): A = program match { case Single(e) => runEffect(e) case Sequence(p, next) => val res = runProgram(p) runProgram(next(res)) }

The last thing to do is to reimplement the impure function in a modular way by applying the interpreter to the result of the pure function.

def echo: Unit = runProgram(pureEcho)

There are still some impure IO programs we can’t represent with our current `IOProgram`

ADT. In particular, think of imperative programs structured as follows: *“Do this program; then, do this other program, possible taking into account the result of the last program; etc.; finally, return this value, possible taking into account the results of the last steps.”*. It’s the last step which can’t be represented. For instance, let’s consider the following program.

def echo(): String = { val read: String = readLine println(read) read }

This program is similar to the last one, but this time we return the *String* read, i.e., a pure value. So let’s add this new functionality to our ADT.

sealed trait IOProgram[A] case class Single[A](e: IOEffect[A]) extends IOProgram[A] case class Sequence[A, B](e1: IOProgram[A], e2: A => IOProgram[B]) extends IOProgram[B] case class Value[A](a: A) extends IOProgram[A] def pureEcho: IOProgram[String] = Sequence( Single(Read), read => Sequence( Write(read), _ => Value(read)))

This is the final form of our ADT, whereby a program can be one of three:

- Single: A single instruction.
`Sequence`

: A sequence of context-dependent programs.`Value`

: A pure value (e.g. a`String`

,`Int`

,`MyFancyClass`

, etc.)

In order to update the interpreter, we just have to deal with our new type of IO programs.

def runProgram[A](program: IOProgram[A]): A = program match { case Single(e) => runEffect(e) case Sequence(p, next) => val res = runProgram(p) runProgram(next(res)) case Value(a) => a }

As you can see, this interpretation is fairly easy: a program `Value(a)`

just means “returns a”, which is what our interpreter does.

Last, we compose interpreter and pure function as usual to obtain a modular version of the original impure program:

def echo: String = runProgram(pureEcho)

This post aimed at showing that no matter how complex you impure programs are, you can always design a DSL to represent those programs in a purely declarative way. In our case, the DSL for building IO programs we ended up with is pretty expressive. In fact, we can represent any kind of imperative control flow with it. Try it!

There are, however, two major flaws we have still to deal with. First, we have to admit that the readability of programs written in the final `IOProgram`

DSL is … poor, to say the least. Second, there is a lot of boilerplate involved in the design of the `IOProgram`

type. Indeed, no matter the type of DSL we are dealing with (based on IO instructions, File system operations, Web service calls, etc.), if we need imperative features, we will need to copy & paste the same `Sequence`

and `Value`

cases. We leave the solution to these problems for the next and last post of this series!

**Edit:** All code from this post can be found here.