Facets as Composable Extension Points

Tuesday, June 7, 2022 architecture codemirror

An extensible system, at its base, is a system that allows people to add additional functionality that was not anticipated by the core system.

A good extensible system also makes sure multiple extensions that don't know anything about each other can be combined, and compose in ways that don't cause problems.

The problem has several aspects.

Composition: Multiple extensions attaching to a given extension point must have their effects combined in a predictable way.
Precedence: In cases where combining effects is order-sensitive, it must be easy to reason about and control the order of the extensions.
Grouping: Many extensions will need to attach to a number of extension points, or even pull in other extensions that they depend on.
Change: The effect produced by extensions may depend on other aspects of the system state, or be explicitly reconfigured.

This post tries to explain CodeMirror's (a code editor library) approach to solving this problem.

A facet, in this system, defines an extension point. It takes any number of input values and produces an output value. Examples of facets are...

Event handlers, where individual extension can define function that handle a given event.
Editor configuration, like the tab size and whether content is read-only.
The set of markers to style the content with (for example syntax highlighting).
The set of gutters to show next to the content.

When defining an editor state, you pass in a collection of facet input values, which together define the behavior of the editor. In a given state, each facet has zero or more inputs. Their output value somehow combines these—it may simply be an array of input values, or some other function of them.

Facets are defined as values and (optionally) exported so that third-party code can provide inputs. The core system defines a number of facets, but facets defined outside it work exactly the same as those defined by the core.

Precedence

Often input values need a well-defined order. For event handlers, this determines which handlers get to go first, for example. For gutters, it defines the order in which they are displayed, and so on.

The order in which the facet values are provided when configuring the editor state provides a predictable ordering for the facet inputs, and is used as a basis for precedences. So if you provide two handlers for a given event, the one that you provide first will take precedence.

But sometimes the code that defines a facet value knows that it should have a given precedence, and you don't want to be dependent on the programmer using this extension to get the relative order right. For cases like this, the system also supports explicit precedence tagging, which assigns one of five (“highest” to “lowest”) precedence categories to a given extension. The actual precedence of inputs is determined first by category, then by order.

Grouping

A given extension often needs to provide multiple facet values. For example, a code folding system needs to define a state field to hold information on what is currently folded, key bindings to control folding, a gutter to display fold markers, and a CSS module to style its UI elements.

To make this easy, extensions can be provided as arbitrarily deeply nested arrays. A function exported from an extension module can return an array of extensions, which can be included in a bigger configuration by just putting the result of calling that function alongside other extensions in the array used to define the editor state.

The actual ordering of the extensions is created by recursively flattening this array, resulting in a single array of input values, each tagged with a facet. These are then reordered based on explicitly assigned precedence categories and split by facet to provide the actual inputs for a given facet.

Deduplication

Because different extensions may depend on each other, and thus include each other's extension trees in their own extension tree, it becomes likely that people will end up with duplicated extensions in their configuration. For example, both the line numbers extensions and the fold gutter extension might use an extension that defines editor gutter infrastructure.

Because it can be wasteful or even break things to actually include such shared dependencies multiple times, CodeMirror's extension system deduplicates extensions by identity—if the same extension value occurs multiple times in a configuration, only the one in the highest-precedence position is used.

As long as extensions that run the risk of accidentally being used multiple times take care to statically define their extension objects, and always return the same object, this makes sure such shared dependencies don't cause problems. Things like extension configuration, which might be different across uses of the extension, can often be put in a separate facet, which combines the parameters given by multiple users in some reasonable way, or raises an error if they conflict.

Reconfiguration

Some of the inputs to facets might change over the lifetime of an editor. And just creating a fully new editor state with a new configuration may lose information (say, the undo history) contained in that state.

Thus, existing states can be reconfigured. The system supports two kinds of reconfiguration: full reconfiguration, where the root of the extension tree is replaced with a completely new set of extensions, or compartment reconfiguration, where you tag part of your initial extension tree as a compartment, and then later replace only that part of the tree.

In either case, the data-driven approach to configuration (the code can compare the old and the new inputs) allows the system to preserve parts of the state that didn't change, and update the values of facets whose inputs did change.

Dynamic Inputs

Systems with a complicated user interface tend to, at some point, grow some form of incremental computation support. They need to keep the things they show to the user consistent with their state, but their state is large and complicated, and can change in all kinds of ways.

A code editor is definitely a complicated user interface, and because it must be as responsive as possible, has a strong need to avoid needless recomputations. Facets help with this. For a start, they avoid recomputing output values when the facet's inputs stay the same, so code that depends on the facet can do a quick identity-equality test on the facet's current output value to determine whether it changed.

But it is also possible to define dynamic inputs for facets, which provide an input value (or a set of values) that is computed from other facets or other aspects of the editor state. The state update system makes sure that, if any of the dependencies change, the input value is recomputed—and, if it is different than its old value, the facet value is also recomputed, as are any dynamic values that depended on that facet, and so on.

Representation

Because most facets, for a given configuration, have a static value, their representation can be optimized in a way that avoids doing any work on state updates. This is helpful, because the editor state tends to be updated multiple times per second, and we don't want to do any superfluous work during those updates.

When a given configuration is resolved, facets are categorized as either static or dynamic, depending on whether they have dynamic inputs. Each facet is assigned an address in either the static values array (which is reused as-is on any state update that doesn't change the configuration) or the dynamic values array. The latter is copied on state updates, and the dependency graph between facets (and other state fields) is used to determine which of the values need to be recomputed and which can be kept as they are.

Facets with no inputs at all aren't even stored in the state. When queried, the value that facet has with zero inputs can be looked up from the facet.

Marijn Haverbeke's blog (license)