Under the Rug: Hidden (but Essential) Complexity

I don’t like docopt. That makes me an outlier; most people I’ve talked to seem to think it makes command-line applications “easy.” They like its premise that, by writing documentation, you’re writing code. When you make a command-line application with docopt, you’ve automatically got the help text written and the module documented properly, right? Why bother with something verbose and object-oriented like argparse?

My objection is that you aren’t writing code, but you’re also not writing documentation, despite the name of the package. Instead, you’re writing in a domain-specific language embedded in a string within your application code. From Python’s point of view, that string has no structure: it’s not “code” until it’s parsed by the docopt parser, which is itself six hundred lines of procedural Python.

DSLs can be powerful and well designed:

SQL expresses relational algebra for querying databases.
XPath allows terse access to XML elements and properties.
yacc specifies grammars for parsing languages.

The common theme across all of these is that they understand themselves as languages, though, and commit to that perspective. SQL is an ISO standard, and implementation-specific documents usually include formal or nearly-formal grammars of their structures. XPath is a W3C standard. yacc isn’t standardized, but it’s old, popular, and so well-documented that it became an de facto standard; the GNU Bison manual is a comprehensive reference.

The structure of these languages is a first-class citizen, designed to meet the needs of the domain. By contrast, bad DSLs take a superficial approach, often marked by a design that privileges the “happy path” and can’t cope with changes when they become necessary. Their “prettiness” falls apart when they meet use cases that weren’t considered in their design. For example, if you look at any code that uses docopt, you’ll see a lot of type handling after arguments have already been “parsed”:

# Among other declarations of options.
"""
  --count    The number of times to do something.
"""

# Within the application code.
count = arguments["--count"]

if count:
    count = int(count)
else:
    os.exit("You must supply a count!")

These stem from gaps in the design of the language: how do you specify the type of data provided to an argument? How do you specify that an argument is required? How do you specify a group of arguments are related? How do you specify defaults for positional arguments? How can you tell which part of your “docstring” is causing your arguments to parse incorrectly? (The answer to all of these is simply “you can’t.”)

Another example is the Gherkin language for writing behavioral tests. Its marketing pitch is simplicity in expressing tests:

Feature: Guess the word

  Scenario: Maker starts a game
    When the Maker starts a game
    Then the Maker waits for a Breaker to join

  Scenario: Breaker joins a game
    Given the Maker has started a game with the word "silky"
    When the Breaker joins the Maker's game
    Then the Breaker must guess a word with 5 characters

The language, as defined in the “brochure,” consists largely of a few keywords: Feature, Scenario, Given, When, Then, and Background. In reality, though, the code behind real Cucumber features is a regular expression (that matches against the “English” sentence in the feature file) combined with a block of actual application code that performs the work. Here’s a more realistic example:

Scenario: Simple data from environment
    Given I use a fixture named "environment_plugin"
    Given I set the environment variables exactly to:
      | variable    | value           |
      | test        | Hello, World!   |
    When I successfully run `tiller -b . -v -n`
    Then a file named "test.txt" should exist
    And the file "test.txt" should contain "Hello, World!"

Can you find the definition of the I use a fixture named function? What about I set the environment variables exactly to? I successfully run? How are file existence and contents comparison being handled? Make no mistake: these “step definitions” are code (in a bad language) and can fail or behave in surprising ways just as any other code can, but with the added twist that their API is inscrutable and unsearchable.

A distinct, but related, problem is a language that seems generally simple and effective but which is badly defined. INI files are a good example of this type of failure. When a new project chooses to use a different format, usually one that requires a third-party parser, there’s inevitably an outcry: “Why not just use INI files? They’re so simple!” As a comparative feature matrix shows, though, this is a false simplicity. You can’t trust what you get across platforms, languages, or versions of INI parsers, because the language is so poorly defined that numerous dialects have sprung up. Using something like TOML may seem like an unnecessarily contrarian decision, but it frees you from concerns about the lack of structure in a worse format.

There’s a reason that Niklaus Wirth entitled his most famous book Algorithms + Data Structures = Programs. The structures that your programs use and produce–implicit or explicit–are an integral part of those programs. They must be able to consistently represent the values and functions that you use, or they will push that complexity into other, less visible and less cohesive parts of the program. Don’t be fooled by a sales pitch that ignores necessary aspects of its target domain.