The Missing Middle: Systems as Languages

In “Building Bridges,” I describe a philosophy of software architecture based on the idea of systems that span theory (the “left bank” of the river) and computational reality (the “right bank” of the river). In this philosophy, the business value that we deliver is primarily in the “middle of the river,” so to speak. If theory governs one bank and practical constraints another, there’s a significant gap between them that also requires design.

The most popular design methodology in the modern era is “Domain Driven Design” (hereafter DDD), originally codified in a book of the same name¹ by Eric Evans. Like so many other philosophies, its prescriptions often fall apart in the face of reality. In particular, it treats persistence as part of the model and insists on the supremacy of domain experts’ language. Both often fail to cope with complex state requirements and necessary parts of implementations that don’t neatly fit “business perspectives.”²

Its core idea of a system as a model of the business domain it is intended to serve is good, though. Systems need to serve a purpose, and present that purpose intelligibly to several different groups: the implementors, the business support team, the sales team, and customers. In other words, software systems need to tell a story to each of those classes of users. In the original DDD book, Evans captures this concept through a focus on a concept he calls “Ubiquitous language.” This language is intended to be sourced from the experts' understanding of the domain, supplemented only by “high-level organizing principles imposed on the model.”

The idea of a language as the organizing principle of the system is a great one, as far as it goes, but DDD relies too much on non-technical experts. To be frank, from personal experience, I have worked with only a few domain experts who used coherent and consistent language for their business domains. Even if they do, this can be thrown into disarray when the software system acquires another purpose or needs to absorb a second system, and implementation can be strengthened by keeping domain experts’ input at arm’s length.

I think a more productive style of design is sourced from a secondary DDD text, Domain Storytelling¹. Its premise is that systems are collections of scenarios that consist of actors who act upon domain objects via activities. In other words, sentences that consist of subjects acting upon objects via verbs. I won’t restate domain storytelling’s propositions here in their entirety. You can get the gist of the system from their website, and the key insight is really the conception of a system as a set of scenarios built from nouns and verbs. Much of the rest is ceremony around that: eliciting the scenarios from domain experts and using diagrams to encode them. Like all such systems, it’s optimistic in some ways, impractical in others, and generally more formal than it needs to be. No matter: the important part is the language.

Language is powerful because it can be used to express the actions and logic of the system with no reference to anything outside. This is another weakness of DDD: although it acknowledges and emphasizes the power of domain language, it doesn’t make space for that language to be encoded without recourse to implementation details. A lot of Evans’ book is dedicated to discussing repositories, aggregates, bounded contexts, anticorruption layers, and other implementation concerns that are ultimately unrelated to the domain. As a “middle of the river” book, it has only false foundations (design patterns, specifically) and no cohesive way to model relationships with hardware and state.

Genuine theoretical models are based on math and proof. There are patterns that we can rely on, like algebraic data types, pure functions, state machines, and monadic error handling, but these exist outside of and independent of the language that we develop for the system. DDD and other “object-first” methodologies either ignore or reject this concept. The “patterns” must describe objects that the model subsumes, and behavior of the pattern objects cannot be truly inferred because patterns do not exist except as a name. Every AbstractFactory that you encounter may behave differently from any other, and still be perfectly defensible as an “abstract factory.” The name of the class offers no explanatory power to the language of the system. Robust theoretical constructs don’t have this problem, because they have mathematical properties that exist independently of their implementation.³

To maintain the separation between the model and its foundations, almost all business logic should be written as pure functions⁴ in terms of simple data structures. How those data structures are constructed or what happens after a pure function is invoked is not the concern of the business logic. As an example, an apartment leasing system is built on a single core function:

let create_lease ~apartment ~tenants ~start_date ~length_months ~rent =
    let end_date = add_months start_date length_months in
    { apartment; tenants; start_date; end_date; rent }

Of course, this might be the culmination of a number of additional steps, but create_lease is the purpose of the system, and those additional steps should also be modeled in terms of data that’s “just there” and outputs that go “somewhere.” Every function will then serve as the verb in the language of the system, acting on the nouns embodied in the data structures. Impurity, commonly called “side effects” (or “non-determinism” in more formal resources), is pushed out to the edge of this language system and modeled within the type system. This is conceptually the same as the “functional core, imperative shell” design, with a bit more rigor. Let’s expand on the premise a little.

In this system, we might receive an HTML form specifying that a tenant (whose data is included) is renting a specific apartment. That form is posted to a /create-lease endpoint, exposed by a web framework, and parsed by that same web framework into four important data structures: the tenant, the start date, lease length, and an apartment specifier. After these objects have been constructed from the user’s input, they can be passed into a workflow function that embodies a “sentence” of the system language. This workflow function can then use stateful functions and business logic as necessary to complete the process, eventually returning either a successful result or an explanatory error.

In this architecture, there are five main subsystems:

The impure shell is the Coordinator. It depends on the domain language and system state and is the entry point to all of the system’s logic.
The results of the coordinator are processed and presented to the user as Output. This could be via a web service, CLI output, or a GUI, depending on how the system was invoked.
That invocation is handled by Input. This is the part of the system that routes web requests, displays HTML forms, or exposes command line arguments, parses the input, and dispatches to the appropriate coordinator methods.
The pure domain language is the Logic. It depends on nothing and is defined in terms of itself. If need be, it can be shared between multiple applications as a library because it is isolated from other concerns.
Persistence of all kinds is handled by the State subsystem. It exposes logical interfaces to the coordinator, but the coordinator is ignorant of the implementation details. It asks for what it needs, and the State subsystem gives it back—or fails.

As a mnemonic, you can think of this as the COILS architecture. Notice that its dependencies are all unidirectional:

graph LR
    Input --> Coordinator
    Output --> Coordinator
    Coordinator --> State
    Coordinator --> Logic

The level of abstraction that you choose to implement at each dependency boundary will vary depending on your system’s needs, but this architecture offers us a way to express the “middle of the river” domain language without getting tied up in the minutiae of implementation details and without the pathological flexibility of object-first modeling. The right bank is hidden behind the State interface, safely exposed to the coordinator through logically named functions, or handled completely out of the sight of the core system by the Input and Output subsystems. This is where sophisticated typing will usually be used: result types are used for error handling and optional results from the system state, and nondeterministic inputs and outputs may play a role in those subsystems. The domain logic will make use of relatively simple algebraic data types and functions.

My next post will be a more concrete, demonstrating what our hypothetical leasing system might look like if it was really implemented. If the terms above were too much, consider reading about algebraic data types (ADTs) first. “Simple Algebraic Data Types” is a good introduction. It uses Haskell but should be readable. If that’s not good enough, Claude Opus gives reasonable explanations of this concept and you can choose the language you prefer.

An Amazon Affiliate link. ↩︎ ↩︎
I once worked on a system where funds were made available to the user at some point in the future. The domain experts called these “deposits,” and were very concerned about the complexity of making sure that “future deposits” didn’t get spent early. I solved this by calling them “fund allocations” and moving them into a ledger as a credit on the effective date, so that they were never debited before they had been credited. This also neatly fixed a problem with the previous system, which had never been able to reliably offer running balances. ↩︎
Another way to look at it is that by grounding part of the system’s design in computer science, you can take advantage of others’ work on making sure that those ideas are actually usable in the context of the real world. Even if you interpret the beginnings of the “design patterns” movement charitably, its proponents did not do the leg work to make sure that this was true for all of the patterns they described, so object-oriented design in DDD and other methodologies often becomes sadly misshapen from its original architecture simply because that architecture wasn’t fit for purpose. ↩︎
Pure functions are those that act only on their inputs and produce only their outputs. They have no “side effects” on the real world. ↩︎