As of today, version 6.0 is stable. Going forward, probably at least several years, all new releases will be under the 6 major version, and be backwards compatible.
The library has been usable and largely stable for over a year, with only minor breaking changes. I generally prefer to release late, to avoid having too many regrettable mistakes slip into the stable release, which would then have to be kept there indefinitely. Without a doubt there will be things that, a year from now, I wish I had released differently, but by having users work with the code in production for a significant amount of time, a lot of small issues and sources of friction have been found and resolved before being set down in stone.
Work on this system started four years ago, with Prototype Fund funding the initial work. It was announced publicly and crowd-funded a year after that, built out into a useable system in the two years after that, and refined and stabilized in the past year. During the first two years, I collaborated with Adrian Heine on the design and implementation of the initial system.
For some more background on the new library, see the blog posts on Lezer (the parser system), facets (the extension system), and collaborative editing. For an overview of the entire system, take a look at the system guide. For a rough summary of what changed in the interface since 5.x, see the migration guide.
]]>A good extensible system also makes sure multiple extensions that don't know anything about each other can be combined, and compose in ways that don't cause problems.
The problem has several aspects.
Composition: Multiple extensions attaching to a given extension point must have their effects combined in a predictable way.
Precedence: In cases where combining effects is order-sensitive, it must be easy to reason about and control the order of the extensions.
Grouping: Many extensions will need to attach to a number of extension points, or even pull in other extensions that they depend on.
Change: The effect produced by extensions may depend on other aspects of the system state, or be explicitly reconfigured.
This post tries to explain CodeMirror's (a code editor library) approach to solving this problem.
A facet, in this system, defines an extension point. It takes any number of input values and produces an output value. Examples of facets are...
Event handlers, where individual extension can define function that handle a given event.
Editor configuration, like the tab size and whether content is read-only.
The set of markers to style the content with (for example syntax highlighting).
The set of gutters to show next to the content.
When defining an editor state, you pass in a collection of facet input values, which together define the behavior of the editor. In a given state, each facet has zero or more inputs. Their output value somehow combines these—it may simply be an array of input values, or some other function of them.
Facets are defined as values and (optionally) exported so that third-party code can provide inputs. The core system defines a number of facets, but facets defined outside it work exactly the same as those defined by the core.
Often input values need a well-defined order. For event handlers, this determines which handlers get to go first, for example. For gutters, it defines the order in which they are displayed, and so on.
The order in which the facet values are provided when configuring the editor state provides a predictable ordering for the facet inputs, and is used as a basis for precedences. So if you provide two handlers for a given event, the one that you provide first will take precedence.
But sometimes the code that defines a facet value knows that it should have a given precedence, and you don't want to be dependent on the programmer using this extension to get the relative order right. For cases like this, the system also supports explicit precedence tagging, which assigns one of five (“highest” to “lowest”) precedence categories to a given extension. The actual precedence of inputs is determined first by category, then by order.
A given extension often needs to provide multiple facet values. For example, a code folding system needs to define a state field to hold information on what is currently folded, key bindings to control folding, a gutter to display fold markers, and a CSS module to style its UI elements.
To make this easy, extensions can be provided as arbitrarily deeply nested arrays. A function exported from an extension module can return an array of extensions, which can be included in a bigger configuration by just putting the result of calling that function alongside other extensions in the array used to define the editor state.
The actual ordering of the extensions is created by recursively flattening this array, resulting in a single array of input values, each tagged with a facet. These are then reordered based on explicitly assigned precedence categories and split by facet to provide the actual inputs for a given facet.
Because different extensions may depend on each other, and thus include each other's extension trees in their own extension tree, it becomes likely that people will end up with duplicated extensions in their configuration. For example, both the line numbers extensions and the fold gutter extension might use an extension that defines editor gutter infrastructure.
Because it can be wasteful or even break things to actually include such shared dependencies multiple times, CodeMirror's extension system deduplicates extensions by identity—if the same extension value occurs multiple times in a configuration, only the one in the highest-precedence position is used.
As long as extensions that run the risk of accidentally being used multiple times take care to statically define their extension objects, and always return the same object, this makes sure such shared dependencies don't cause problems. Things like extension configuration, which might be different across uses of the extension, can often be put in a separate facet, which combines the parameters given by multiple users in some reasonable way, or raises an error if they conflict.
Some of the inputs to facets might change over the lifetime of an editor. And just creating a fully new editor state with a new configuration may lose information (say, the undo history) contained in that state.
Thus, existing states can be reconfigured. The system supports two kinds of reconfiguration: full reconfiguration, where the root of the extension tree is replaced with a completely new set of extensions, or compartment reconfiguration, where you tag part of your initial extension tree as a compartment, and then later replace only that part of the tree.
In either case, the data-driven approach to configuration (the code can compare the old and the new inputs) allows the system to preserve parts of the state that didn't change, and update the values of facets whose inputs did change.
Systems with a complicated user interface tend to, at some point, grow some form of incremental computation support. They need to keep the things they show to the user consistent with their state, but their state is large and complicated, and can change in all kinds of ways.
A code editor is definitely a complicated user interface, and because it must be as responsive as possible, has a strong need to avoid needless recomputations. Facets help with this. For a start, they avoid recomputing output values when the facet's inputs stay the same, so code that depends on the facet can do a quick identity-equality test on the facet's current output value to determine whether it changed.
But it is also possible to define dynamic inputs for facets, which provide an input value (or a set of values) that is computed from other facets or other aspects of the editor state. The state update system makes sure that, if any of the dependencies change, the input value is recomputed—and, if it is different than its old value, the facet value is also recomputed, as are any dynamic values that depended on that facet, and so on.
Because most facets, for a given configuration, have a static value, their representation can be optimized in a way that avoids doing any work on state updates. This is helpful, because the editor state tends to be updated multiple times per second, and we don't want to do any superfluous work during those updates.
When a given configuration is resolved, facets are categorized as either static or dynamic, depending on whether they have dynamic inputs. Each facet is assigned an address in either the static values array (which is reused as-is on any state update that doesn't change the configuration) or the dynamic values array. The latter is copied on state updates, and the dependency graph between facets (and other state fields) is used to determine which of the values need to be recomputed and which can be kept as they are.
Facets with no inputs at all aren't even stored in the state. When queried, the value that facet has with zero inputs can be looked up from the facet.
]]>Roughly, this “beta” status means:
I actually like the current programming interface. Apart from the
move from a mono-package to a group of separate packages (removing
the next/
in the package names), no significant breaking changes
are planned. There might still be breaking changes in peripheral
parts of the interface, when real-world experience shows the
current interface to be problematic, but these should be of the
type where a quick scan of the release notes and some grepping
through your code are enough to adapt to them.
The system is no longer a hopeless unfinished mess. In fact, it seems to work pretty well. Which isn't to say it won't break in your use case, of course, since it has seen little practical use yet and you're likely to be the first to do some specific thing with it.
I've put together enough documentation for people who don't want to read source code to be able to figure out how the system works. Some of those docs are still rough (issue reports welcome), but they should be usable.
You can read the change log if you want to see, in some detail, what I've been up to in the past months.
The retrospective for the Xi editor project has inspired me to also spend some time thinking about this project's goals, and how they developed.
In the initial announcement, we emphasized these goals:
Accessiblity (especially screen reader usability)
Mobile support
Native Unicode/bidirectional text support
Modularity
Performance (especially avoiding performance cliffs for huge files or long lines)
Well-designed, TypeScript-friendly programming interface
Most of these we followed through on pretty much as planned. The exception to this is the bidirectional text and Unicode support—that did get implemented, but not by leaning on native behavior.
The reasons for this are:
It is often extremely hard or awkward for scripts to interact with
the native behavior. Though browsers compute an odering for
bidirectional text, scripts cannot access this ordering. Though
browsers (sort of) have advanced cursor motion logic, the only way
to get at this is to focus an element, put the selection in there,
use
Selection.modify
to move it, and then see where it went.
Often the native behavior just isn't very good. Chrome, after we started this project, seems to have given up on proper visual cursor motion. When there are non-text elements in the document, cursor motion is inconsistent between browsers and often just broken.
Even when the native behavior isn't downright buggy, it might not be what we want for a code editor. Selecting by word should take criteria specific to the programming language into account, for example.
Thus, the project has ended up with its own implementation of Unicode segmentation, bidirectional text ordering, and, on top of that, cursor motion. The original vision was that we could avoid this, but that didn't turn out to be realistic. On the bright side, not depending on native behavior makes the library a lot less susceptible to all the bugs and inconsistencies in those native implementations.
Of the remaining points, modularity is probably the one that I underestimated the most. Just organizing the library core as a number of separate modules was already a somewhat subtle exercise, but the real difficulty lies in making sure 3rd party code composes smoothly. The extension system went through four or five rewrites, and drove me to desperation at times, but I'm happy with what we landed on.
Avoiding performance cliffs has also required a lot of discipline to make sure complexity is only linear to document or line length when it absolutely needs to be, as well as careful API design to avoid making the slow approach the easiest or most obvious one. But I feel the vision has been realized well.
Having lived with a latent anxiety about whether I'd be able to deliver this project's promises for over two years now, I think I'm about to start actually believing I can. That's a relief.
The main thing you'll probably notice, when trying to integrate the library, is that a lot of languages don't have any support yet. This is the next thing on my roadmap. I'll implement proper support for some more major languages, and port most of the old CodeMirror 5 modes so that they can run in CodeMirror 6 (though, apart from some languages that are absolutely unparseable in a context-free way, the vision is to move to Lezer grammars for all important languages).
Next up would be the mostly-drop-in compatibility wrapper to the CodeMirror 5 interface that I've been talking about.
(But right now, we have summer holiday here, and the pace of work will be slow to nonexistant during the coming weeks.)
]]>I won't introduce anything new or exciting here—the design I ended up with is a very boring non-distributed operational transformation. In a way, this post is publishing a negative result: I looked into a bunch of interesting alternative approaches, but found they didn't meet the requirements for this system.
Since collaborative editing is a tricky field with a lot of different solutions, all of which have their awkward trade-offs, I think the path towards this boring end result might still provide a useful resource for people working on similar systems.
There's quite a disconnect between the scientific literature on collaborative editing and what most collaborative editors are doing. The literature is largely concerned with truly distributed collaboration, where a number of peers, taking equivalent roles, directly exchange updates among themselves and still somehow converge on the same document. A typical web system, on the other hand, has clients talking to a server, which orchestrates the exchange of updates.
These problems are very different, and if you're aiming to implement the latter, about 95% of collaborative editing literature is discussing a problem you do not have. Working in a truly distributed fashion is very attractive in principle, of course. It is strictly more general, and has connotations of escaping the over-centralized modern web. But it does drag in a lot of problems, even outside of the document convergence—peers have to store the document along with, depending on the technique used, its entire history. They have to somehow discover each other, maintain connections, and so on.
So for the core of a JavaScript-based library, I decided that support for a distributed model wasn't important enough to justify the additional complexity. I'll get back to what that complexity looks like later.
(I'm sorry, I'm going to explain operational transformation again, just like a hundred other blog posts. Hang tight.)
Operational transformation involves, in its simplest form, a transformation function that takes two changes A and B, which both apply to the same document, and produces a new pair Aᴮ (a version of A that applies to the document produced by B) and Bᴬ (B but applies to the document created by A), such that A + Bᴬ and B + Aᴮ (where + indicates change composition) produce the same document.
This can be visualized as a diagram like this:
Docˢ
A ↙ ↘ B
Docᴬ Docᴮ
Bᴬ ↘ ↙ Aᴮ
Docᴬᴮ
You can roughly think of the transformation as moving a change through another change. Sometimes this just involves adjusting the positions it refers to (moving “insert A at 5” through “delete 0-1” yields “insert A at 4”), sometimes it it is more involved (two changes can't delete the same content, so in that case some of the deletions have to be dropped in the transformed change).
For a plain-text document model, such a transformation function is not very hard to define.
The other part of an operational transformation system is its control part, which determines how changes are shared and when they are transformed. A simple centralized setup can do something like this:
Keep a list of unsent local changes.
Periodically try to send those to the server, which will either accept them or, if another client sent a change first, reject them because your changes start in a base document that doesn't match the current central document.
When the server sends you changes, transform your unsent changes and the remote change over each other. Apply the transformed remote change locally, and replace your stored local changes with the transformed version, so that the next time you try to send changes you send those.
Because this makes everything proceed in lockstep, it is easy to see that this converges—the only difference in order in the way clients apply changes is when remote changes come while there are local changes pending. This is exactly the scenario that the transform function's convergence handles.
When there is more than one remote or local change involved, the situation is slightly more tricky. If the change representation that your transform function operates on allows series of changes to be composed into a single change, you can first do that and then perform a single transform.
If not, you basically need to build up a bigger rectangle. If A and B are remote changes, and X and Y local changes, you can do something like this:
Docˢ
A↙ ↘X
Docᴬ Docˣ
B↙ ↘Xᴬ Aˣ↙ ↘Y
Docᴬᴮ Docᴬˣ Docˣʸ
Xᴬᴮ↘ ↙Bˣ Yᴬ↘ ↙Aˣʸ
Docᴬᴮˣ Docᴬˣʸ
Yᴬᴮ↘ ↙Bˣʸ
Docᴬᴮˣʸ
In each individual diamond in this monstrosity, the lower arrows are made up by transforming the upper arrows against each other. So the whole thing can be built up with O(N×M) transformations, where N and M are the number of changes on both sides. The guarantee that a single set of changes can be transformed over each other and still converge can thus be used to compute transformed versions of bigger groups of changes—the bottom sides of the big diamond provide the final transformed versions of the change sets.
So that's pretty easy. Still, the general consensus seems to be that OT is hard. There's an amusingly long list of papers in this space that have later been proven to be incorrect.
If you've been programming for a while you've probably run into this kind of thing before: there's a hack that works simply and efficiently, but completely falls apart when you add more requirements. OT is such a hack.
When the document structure is more complicated than plain text, and change structure is more complicated than just insertions and deletes, it becomes very hard to define a converging transformation function. And when you don't have a centralized party to determine the order in which changes get applied, you need, as far as I understand the literature, to keep data beyond the current document (“tombstones” that describe where deletions happened), to guarantee convergence.
The main problem is that this technique is hard to reason about. It has many practical advantages but, being a hack, doesn't provide the kind of mental framework that would allow you to confidently apply it in different situations.
That is where CRDTs come in. They are another approach to convergence that, contrary to operational transformation, does provide a way for us mortals to reason about convergence.
Basically, CRDTs complicate the way they represent data and changes to the point where you can apply changes in any order, and as long as you applied the same changes, you get the same result.
For text-style data, CRDT approaches roughly work by assigning every character a unique ID. There's two general approaches to representing changes.
The first keeps deleted characters in the document forever, just marking them as deleted. Character insertions provide reference IDs for the character before and/or after them, and those references, along with data (often a logical clock) embedded in the IDs allow each client to compute the same ordering for the characters.
The second approach generates IDs in such a way that they are already ordered. When inserting something at a given position, you generate a new ID that sits between the existing IDs around that position. This means character IDs must be able to grow in size, so that you can always create a new ID between any given pair. You'll also need to include some client-specific data to make sure IDs are unique. But on the bright side, this approach doesn't require you to keep deleted content around.
Given such a data type, collaborative editing becomes a matter of making sure everybody ends up receiving everybody else's changes, eventually.
But I guess you can see why this isn't a no-brainer either. Compared to OT, where your document representation can just be a minimal representation of the text inside it, these techniques require an extra data structure to be stored for every character in the document. And with some of them, you don't get to delete those characters either.
There are approaches that exploit the way text is often inserted in a linear way (or loaded in bulk at the start of the editing session) to compress this data somewhat by generating sequential IDs for such spans of text, and storing them as a single element. But these introduce further complexity and don't really help provide an upper bound on the data needed—in a heavily edited document the representation may well degenerate to something close to the ID-per-character structure.
For CodeMirror's core, data structures and complications like this conflict with the goal of supporting huge documents and having a pleasant programming interface. So I've decided not to introduce a CRDT in its internal data structures.
It may be entirely reasonable to wire an instance of the library up to an external CRDT implementation, though, as has been done for ProseMirror with Yjs.
In general, I agree with this article that the strict separation between OT and CRDT is not really justified. OT algorithms that support decentralized editing rather resemble CRDT algorithms. But the distinction between dumb-as-rocks classical OT and decentralization-capable approaches is useful.
So the plan is to support centralized collaborative editing out of the box, and punt on the complexities of decentralized collaboration.
I started the project with the idea that I'd just do what ProseMirror does, since that worked well before.
To recap, ProseMirror uses something similar to OT, but without a converging transformation function. It can transform changes relative to each other, but not in a way that guarantees convergence. ProseMirror's document structure is a tree, and it supports various types of changes, including user-defined changes, that manipulate this tree. I didn't have the courage to try and define a converging transformation function there. When a client receives conflicting remote changes, it first undoes all its local pending changes, applies the remote changes, and then the transformed form of its local changes. This means that everybody ends up applying the changes in the same order, so convergence is trivial.
In CodeMirror, the document model is a lot simpler, so though I got pretty far with the ProseMirror approach, it seemed needlessly awkward. A sequence of individual changes, each of which applies to the document created by the previous one, isn't really how the user thinks about updating the document.
If you, for example, want to surround a the range from position 10 to
20 with parentheses, you'd have to create a change “insert (
at 10”,
and then, because the first change inserted a new character, the
second change would be “insert )
at 21”. You could sometimes get
around this by creating the changes in inverse order, but that doesn't
always work either.
Also, when doing things like updating the document data structure, its representation on the screen, or some auxiliary data structure that tracks the document, you need the precise extent of a set of changes. This can be derived from a sequence of individual changes, but it seems like it'd be more pleasant if changes were directly kept in that shape.
So I moved to a system where you don't deal with individual changes,
but rather with change sets, which represent any number of changes
in a flat format, as a sequence of untouched and replaced spans. The
parentheses example above would be represented as “keep 10, insert
(
, keep 10, insert )
”.
(In fact, it'd get another kept span at the end covering the rest of the document. This representation knows the length of the document it applies to, and will refuse to perform operations where these lengths do not match. This dynamically catches a whole class of programmer errors that would be silently ignored before.)
To create a change, you specify your changes with offsets into the current document, and the library will sort and combine them into a single change set.
Given this representation, it seemed an obvious optimization to use real OT, rather than ProseMirror's faux OT, since for this domain it's not hard to define.
Transforming does not just come up with collaborative editing. If you want an undo history that can undo some changes but not others (which is useful for collaborative editing, where you don't want to undo other people's changes, but also comes up in other scenarios), you'll need something similar.
In the simple case, the undo history stores the inverse of the changes you made, and when you undo, it applies those inverted changes. But if you have, say, change A stored in the history, and make a change B, which should not be undoable, you can't just do nothing. Change A applies to the current document, not the document created by B.
So you must create transformed versions of the changes again. The transformed change Aᴮ gets stored in the history instead of A, and, if there are more levels of history below that, the next level must be transformed with Bᴬ, and so on.
Something that comes up quite a lot in an editor is the need to transform a document position in an original document into a corresponding position in the changed document. If text is inserted somewhere before the selection, the selection should move forward with the surrounding text. Some OT systems call this “cursor transformation”, but it also applies to things like ranges of collapsed code, lint warnings, breakpoint markers, and so on.
It's not hard to write a function that takes a position and a change and returns a reasonable new position in the new document. But there are a few complications.
Firstly, when an insertion happens directly at your position, it is ambiguous which side the mapped position should be on. So you'll probably want to have two forms of position mapping—one that stays before insertions, and one that stays after them.
Secondly, though most OT systems just represent changes as a series of insertions and deletions, mapping seems to require us to distinguish replacements from the corresponding delete/insert pair. When a replacement happens next to a given position, the position should never move across the replacement, regardless of whether you're mapping forward or backward. In order to be able to support this, CodeMirror's change representation is encoded as a series of replacements. (Insertions become replacements that don't delete anything, and deletions replacements that don't insert anything.)
And thirdly, worst of all, though it wasn't that hard to use OT to make documents converge, it does not seem to be possible, in an OT-style system, to make mapped positions converge. This is usually not much of a problem, since most of the things that need to be mapped are local to a single client, but there are situations, like collaboratively editing annotations that refer to a given range in the document, where it would be extremely useful to know that everyone's representation of these ranges will stay the same as the document changes.
Let's look at an example. Two users make concurrent edits—A deletes characters 2 to 4, and B inserts an X at position 4. After the clients have synced up, they've made these changes:
A: delete 2 to 4, insert X at 2
B: insert X at 4, delete 2 to 4
If we are mapping position 2, with a forward bias, user A will, for the first change, leave the position at 2, and then, because of the forward bias, map it to the end of the insertion made by the second change, ending with position 3.
User B, on the other hand, finds that the initial insertion didn't affect position 2 at all, since it's two positions ahead of it. The deletion also doesn't affect it, so they end with position 2.
This doesn't seem to be an incident of my implementation or anything like that. Mapping through deletions loses information (in this case, the fact that our original position isn't adjacent to the insertion), and thus I'm confident that, in an OT-style system that doesn't track tombstones for deletions, or any system that defines this kind of mapping in terms of plain document offsets, doing this in a converging way is impossible.
This is kind of weird. It took me a while to accept that we can make documents converge, but not positions, which are so much simpler. The explanation for that is, I guess, that for document maintenance, what is inside deleted ranges becomes irrelevant as soon as those ranges are deleted, whereas position mapping, to be robust, would still need to be able to compare positions inside such ranges.
(Depressingly enough, a similar problem exists when we are just composing changes—mapping through changes X and Y separately may produce a different result from mapping through X+Y composed into a single change.)
This is a large part of the reason why I spent weeks researching CRDTs even though my document convergence needs are well covered by OT. If you have a more fine-grained way to address positions (specifically, one that can refer to deleted positions), and you define position mapping in terms of that, you do not have this problem.
Since CRDTs tend to assign unique IDs to every character, and some techniques keep those around even when deleting the character, you could use such IDs instead of document offsets to track positions. They never change, so you wouldn't even need to do anything to map positions across changes.
(Though, since all CRDT approaches seem to separate insertions from deletions, this would degrade mapping around replacements again.)
Still, that did sound promising. But, again, the cost of such a representation is significant, and in the end, I judged the requirement for converging positions to be too obscure to justify that level of extra complexity and memory use.
]]>The MOSS program asks for a closing retrospective blog post describing the progress made during the grant period. I've written about the progress in the first 9 months of the year in my status update post from August. To summarize, in that period we:
Implemented solid text composition support.
Designed a powerful extension system.
Documented the system with doc comments and set up a process to generate HTML documentation from that.
Created a parser system and integrated it with syntax highlighting.
Implemented and improved various less glamorous subsystems (styling, gutters, in-text widgets).
The past few months have been focused more on concrete user-visible features, instead of big architectural questions. We've added these features:
Code folding support.
A generic extension for showing UI panels above and below the editor.
A search and replace interface.
A generic tooltip extension.
Support for in-editor linting, highlighting issues in the text.
Support for autocompletion.
A theming system that can be used to customize the styling of the elements produced by these various extensions.
A way to provide translations for the strings used in the UI.
You can see many of these features in action in the demo on the website.
Working on concrete extensions was a good way to find out how well our abstractions work in practice. Though we did end up adjusting some things, on the whole the system was a pleasure to work with.
Specifically, the way extensions can be built up out of different behaviors and other extensions was extremely useful. "Behaviors" (the name will probably change to "aspect" sometime soon), the named fields that extensions can add values to, are not just useful for configuration, but also allow, for example, extensions that want to display a panel to simply provide a behavior that indicates this, which is read by the panel-displaying extension. This models something that would be a side-effect (opening and closing panels) in most designs in a much simpler, more robust way.
Work is still ongoing, and will continue into 2020. The list of missing functionality is getting shorter and shorter, but there's a bunch of stabilizing and easy-of-use improvements still ahead. In any case, given how easy the above extensions were to implement, I'm slowly starting to believe that our base architecture is solid.
]]>This post describes a new parsing system I wrote for CodeMirror, a source code editor. It frames the system with some history, and digresses into some neat architectural details.
Editor features like syntax highlighting, bracket matching, code folding, and autocompletion all involve some level of parsing. Unfortunately, since editors have to handle many different languages, they require a generalized approach to parsing.
CodeMirror is in the process of being rewritten, and I wanted to improve the way it parses its content. Parsing inside of an editor comes with its own unique set of constraints, which can be hard to satisfy. Though I had been planning new approaches for years, all I had to show for it so far were a pile of dead ends.
The constraints that make the parsing problem in a code editor hard are roughly these:
The document is constantly changing.
You can't do anything expensive. If the parsing works takes too long, it'll introduce latency that makes editing feel slugglish and unresponsive.
The input is often not in a finished, syntactically correct form. But you still have to make some sense of it—nobody wants an editor where most features stop working when you have a syntax error in your document.
You often want to be able to mix several languages/grammars in a single document (think HTML with JavaScript and CSS embedded in it).
Keeping those in mind, let's go over the approaches I've tried.
The system in as it exists in CodeMirror 5 now (which is pretty
much what we've been using from the very beginning) is a simple
one. For
each language, you write a tokenizer which splits the input into
pieces, and labels each piece with some syntactic category (such as
variable
, keyword
, or number
). The tokenizers can be stateful,
which allows them to secretly be full parsers if they want to.
This state must by copyable, so that the editor can strategically store tokenizer states from a previous run, and after a change, resume one close to that change to avoid re-tokenizing the entire document. Because we are usually only interested in the code in the visible viewport, this means the complexity of re-tokenizing is bounded by the distance between the change and the end of the viewport. Since most changes happen inside of that viewport, this works well in practice.
Such tokenizers are awkward to write directly, so over the years several attempts have been made to build abstractions over them. The first was the Common JavaScript Syntax Highlighting Specification, an attempt by the authors of Mozilla Skywriter (formerly Bespin, later merged into ACE) to define a declarative format for describing tokenizers as state machines with regular expressions (describing the tokens) as edges. The ACE project ended up with an incompatible but similar format (too entangled with their internals to use in CodeMirror, unfortunately). I did an implementation of the original spec for CodeMirror, and then another incompatible extension because the base spec was too limiting. There are a few CodeMirror modes still based on that code, but it was no real success.
I think the reason such state machines (and the somewhat related TextMate grammars which are in wide use in desktop editors) never felt like a great solution is that, once you get past trivial grammars (where their declarative simplicity does look really nice), they don't really help that much with abstraction. Manually designing complicated state machines is a chore. Regular expressions, which are bad enough on their own, become downright terrifying when you have to construct all your edges out of them, often stuffing multiple tokens into a single expression to avoid creating intermediate states. This “abstraction” has a tendency to produce uglier, less maintainable code than what you'd get when writing the tokenizer as plain code.
So in 2017, I started an ambitious project to create a better way to abstractly define incremental tokenizers. I had concluded that classical parser generators based on context-free grammars were never going to work in this context (for reasons that I'll come back to later on). But I kept coming across parsing expression grammars, which weren't based on context-free grammars and had some interesting properties, such as being able to combine multiple grammars to create a new grammar (which is great for mixed-language documents).
So I spent several months building a parsing system that took a PEG-like grammar, compiled it down to a state machine, and made it possible to run that state machine as a CodeMirror language mode.
This system is a marvel. It uses a moderately sophisticated optimizing compiler to generate the state machines. The result works quite well, and is used in several real-world systems today. But unfortunately, if I'm honest, it is a tragically bad idea taken way too far.
Parsing expression grammars are parsed by backtracking. And as such, they are very poorly suited for implementing a stateful tokenizer. In a backtracking system, you never know when you've definitely parsed a piece of content—later input might require you to backtrack again. So what I ended up with was actually not PEG at all, but a system where you had to explicitly annotate where the parser should look ahead. Though grammars written this way were relatively readable, they involved a lot of finicky, error-prone kludges to resolve local ambiguity.
Also, parsing PEG is just really inefficient. Such grammars are “scannerless” meaning they don't make a distinction between tokenizing and parsing. When parsing in that way naively, you basically have to run your whole parsing logic for every input character. Probably multiple times, due to backtracking. A lot of the magic in the compiler was intended to recover the tokens that were implicit in the grammar, in order to recover some efficiency. But the system never came close to hand-written language modes in terms of speed.
So, though I knew I needed a new approach, I went into the CodeMirror 6 rewrite without any specific idea on what that approach would look like.
And then I saw tree-sitter, and was enlightened.
Tree-sitter is a parser system written with the code editor use case in mind, and is in the process of being integrated into the Atom editor. It takes a much more ambitious approach to what a parser inside an editor should do: It builds up a full, accurate syntax tree for the content.
You can do so much more with an actual syntax tree than with a sequence of tokens. Whereas tokens, possibly augmented with some information stored in the tokenizer state, allow you to sort of approximate understanding some aspects of the code's structure, a tree usually gives you precisely the information you need.
Most of the ideas that tree-sitter uses aren't new, in fact a paper from 2000 describes a somewhat similar system. But as far as I know, tree-sitter is the first system that puts them all together into a practical piece of software.
Unfortunately, tree-sitter is written in C, which is still awkward to run in the browser (and CodeMirrror targets non-WASM browsers). It also generates very hefty grammar files because it makes the size/speed trade-off in a different way than a web system would.
But good ideas can be ported. Lezer is a JavaScript-based system heavily inspired by tree-sitter.
For a long time, I was firmly convinced that classical parser system based on context-free grammars and LL or LR parsing algorithms were just not suitable for the editor use case. My arguments for this were...
Context-free grammars are a limiting abstraction that breaks down as soon as the language does anything funky. Needing the grammar to be LR or LL to please the parser generator further pins you into a corner.
This is not wrong. Expressing operator precedence in a pure context-free grammar requires writing a silly formulaic rule for each level of precedence. And when you need to implement something like automatic semicolon insertion or whitespace-sensitivity, which would be a couple of lines of code in a hand-written grammar, you can't express that directly, and have to somehow escape the context-free abstraction.
Making such a grammar suitable for an LR parser generator can be even more tricky, and often requires you to have a rather deep understanding of how the parser generator works.
But like many things, once you get to know them, they aren't that bad. Parser generators can support precedence declarations, which make operator parsing a lot less terrible. They can even output decent error messages.
Supporting dynamic resolution of ambiguities through something like GLR parsing can provide a practical way out of situations that parser generators are traditionally bad at.
And contrary to some of the abstractions I mentioned before, this one actually gets us something. Context-free grammars, when combined with a proper parser generator, really do give us fast parsers from readable, compact grammar declarations.
A strict separation between the tokenizer and parser is problematic.
It is, in many languages (think of JavaScript's ambiguity between regular expressions and the division operator). It also tends to make mixed-language parsing harder.
But just because this type of parser is traditionally ran with a completely separate tokenizer doesn't mean it has to be. Having the parse state drive the tokenizer is largely unproblematic. You can even have the parser generator set this up automatically, without user involvement.
Generated parsers are way too big.
A naively generated LR parser is huge, and many tools spit out embarrassingly big files. But with careful parser state deduplication and table compression such a parser can be made about as compact as a hand-written one.
Making such a parser error-tolerant is extremely cumbersome.
If you search the scholarly literature for approaches to error-tolerance in LR parser systems, you get a lot of results, with a lot of different approaches, but none of them are very practical. Most require the grammar writer to explicitly annotate the grammar with error-recovery strategies, bloating the grammar and putting the responsibility for getting it right on every grammar author.
Tree-sitter ingeniously abuses GLR parsing, where the parser can try multiple interpretations simultaneously, to integrate automatic error-correction without a lot of extra complexity. Lezer copies this approach.
I called my tree-sitter copycat project Lezer, which is the Dutch word for reader (and pronounced a lot like laser). It is a bit less advanced than tree-sitter in some areas, a bit more advanced in others, and simply different on quite a lot of points, as determined by a different set of priorities and tastes.
CodeMirror 6 will retain the ability to run a classical stateful tokenizer, but its recommended way to define a language mode is to write a Lezer grammar and wrap it in a CodeMirror-specific packages that adds some editor-related metadata.
Lezer is an LR (with opt-in GLR) parser generator. It has support for incremental parsing, where you can cheaply re-parse a document after local changes have been made to it by reusing pieces of the old parse tree. It automatically tries to recover and continue parsing when it runs into a syntax error, leaving markers in the output tree that indicate where the recovery happened.
Lezer consists of an off-line parser generator tool, which takes a grammar description and outputs a JavaScript module containing a parser for that grammar, and a parser run-time system (which such output files depend on) to do the actual parsing. Only the run-time system and the generated parser need to be loaded by the editor.
The parser outputs non-abstract syntax trees, meaning that it just creates a raw tree structure containing the constructs it parsed (with information on where it found them), without organizing them into a clean, easy-to-use data structure.
The system is optimized for compactness, both in parser table size and syntax tree size. It needs to be practical to ship a bunch of parsers to a user on the web without producing megabytes of network traffic, and it needs to be realistic to keep syntax trees for large documents around without running out of memory.
The Lezer guide provides a more thorough introduction, as well as a description of its grammar notation. In this blog post, I want to go into the neat implementation details that aren't relevant in user documentation.
The point where I became convinced that I definitely needed to use or copy tree-sitter was when I understood its error recovery strategy.
Say you reach a point where you can no longer proceed normally because there is a syntax error. The rest of the input, after the error, is probably full of meaningful constructs that could still be parsed. We want those constructs in our syntax tree. But our regular parsing process is stuck—it doesn't know how to get from the error to a state where the parse can continue.
I definitely did not want to require the grammar author to add error recovery hints to their grammar. These tend to clutter up the grammar and are error-prone to write. Writing a grammar is hard enough without that distraction.
You can see error recovery as a search problem. There might be a parse state and input position (past the error) where the parse can meaningfully continue. We just have to find it.
The actions encoded in the parse tables, along with some recovery-specific actions that the parser wouldn't normally take, provide a kind of search tree. You start at the state(s) where the error occurred, and keep exploring new states from there.
But what does the accept condition look like? When do you know that you've found an acceptable solution? You could define that precisely, for example as the state that can handle the next N tokens without further errors. But we can also be vague.
The solution found by Max Brunsfeld in tree-sitter is to use the same mechanism that's used to parse ambiguous grammars. A GLR parser can split its parse stack and run both sides alongside each other for a while until it becomes clear which one works out.
That's pretty much exactly what a search algorithm does—it tracks a number of branches that it still has to explore, and continues to explore them, possibly pruning unpromising branches with some heuristic, until it finds a solution.
To be able to get good results, or at least some result, in messy situations like longer stretches of invalid input, each branch has a badness score associated with it, which is increased (linearly) each time a recovery action is taken, and decreased (asymptotically) every time it can consume a token normally.
What we want to do is, after an error, try all kinds of possible recovery tricks, which recursively branch off a large amount of states. But then, after a bit of that, we should consolidate to one or, at most, a few parse states again, because parsing input in a whole bunch of different ways is expensive.
To get this effect, Lezer forbids states with a badness higher than a given multiple of the best state's badness (or some maximum threshold) from applying further recovery actions, effectively dropping those branches when they can't proceed normally. In the case where one branch finds a good way to continue, that branch's badness will converge to zero and eventually stop all worse branches. In cases where the input continues to make no sense, all branches will eventually get a badness score exceeding the maximum, and the parser will only continue one of them.
The recovery strategies used are:
Skip the next token, and try again with the same state after that.
Invent a token—take any of the tokens that are valid in this state, and continue to the state that consuming them would produce. This is the main source of branching, since many states allow a lot of tokens.
Force the end of the innermost production that's currently being parsed.
There are situations where the result of this approach isn't entirely optimal, but it usually does well. The important thing is that it always keeps parsing, and does so in a way that remains tractable (exponential searches are quickly dampened). The system is biased a bit towards the token-skipping rule, so that if all else fails it'll, in effect, just continue skipping tokens until it stumbles into a situation where it can continue parsing.
When you have a parser that may be splitting its state—a lot—and build up parts of the tree multiple times, that duplicate tree building and the bookkeeping involved in it can cause a lot of unnecessary work.
The order in which an LR parser creates nodes is inner-to-outer. It will, for example, first create the node for the operands, and then later the node for the operator expression. This suggests an approach: What if, instead of building a tree structure right away, the parser just keeps a flat log of the nodes it created. This can be an array in which the nodes are arranged in post-order, with children coming before parents.
The parser just appends to this array. When splitting the state, one state keeps the existing array, and the other gets a new empty array along with a pointer to the state that has the rest of the array, and the length of that array at the time of the split.
Now splitting involves no node copying at all. You do need to copy the state stack, which LR parser use to track context, but that is generally shallow.
In addition, node allocation becomes as cheap as appending a few numbers to an array. For actions that don't result in tree nodes (Lezer allows you to mark rules as uninteresting, to keep the tree small), you don't have to do anything at all. The control stacks stores the output array position at the start of each rule, and can use that to emit enough data to later reconstruct parent-child relationships.
After a parse finishes successfully, the final state's parent-array pointers can be used to find all the nodes that make up the tree, and construct an actual tree structure out of them.
One tricky issue occurs when skipped content (whitespace and comments) produces nodes. If you have code like this...
if (true) something()
// Comment
otherStatement()
... the comment should not be part of the if statement's node. Yet
the parser only knows for sure that it can finish that node after
seeing the next statement (there might be an else
still coming).
In cases like this, where the output array contains skipped nodes immediately in front of a reduction, the parser has to move them forward and store the end of the node before them. Fortunately, this occurs relatively rarely (unless you add nodes for whitespace, in which case it'll happen at the end of every rule that has a possible continuation).
A nice thing about the flat post-order tree representation is that it is compact. Tree structures constructed the usual way, as separately allocated nodes, incur a lot of extra overhead for pointers and allocation headers. They can also have terrible locality, since who knows how far from each other the memory allocator will put the nodes.
Unfortunately, we can't just use a flat representation for our syntax trees. The incremental parser has to be able to reuse parts of it without copying those parts into a different buffer.
But we can use it for parts of the tree. Storing the coarse structure as a classical tree, but the content of smaller nodes (say less than a few thousand characters long) as flat arrays, gives us the best of both worlds. Since most nodes, by number, live in the fine structure, this saves a large amount of overhead (and helps with locality).
That does mean that we can't reuse small nodes. But since their size is limited, the amount of work that is involved in re-parsing them is also limited. And by removing them from consideration, the incremental parser can avoid quite a bit of the work involved in preparing and scanning the tree for reuse.
A small node stores its content in a typed array of 16-bit unsigned integers. It uses 4 such numbers (64 bits) per node, storing a type, a start position, an end position, and a child count for each node. Contrary to the array created by the parser, these arrays are in pre-order, because that makes forward iteration (which tends to be more common than backward iteration) cheaper. The child count was almost obsolete (the end position can sort of tell you which nodes are children), but Lezer supports zero-length nodes, which might land on the end of their parent node and make it ambiguous whether they belong to it or not.
Client code, of course, doesn't want to deal with this representation. Lezer provides an abstract interface to searching in and walking through trees that hides the buffer structure, allowing you to conceptually work with a uniform tree of nodes.
Lezer, like tree-sitter, stores the result of repetitions in the
grammar (produced by the *
and +
operators) as balanced subtrees.
This means that, unless your input is pathological (say, a thousand
applications of a single binary operator in a row), you tend to get
shallow, well-balanced syntax trees, which are cheap to search and
allow effective reuse.
Depending on the grammar's complexity, an LR parser generator creates between a dozen and a few thousand parse states for your grammar. These represent syntactic positions like “after the opening paren of an argument list” or “after an expression, possibly expecting some expression suffix”.
The parser generator can figure out which tokens are valid in a given state. It can also, for tokens specified as part of the grammar, automatically determine which tokens conflict (match the same input, or some prefix of each other).
A well-known example of conflicting tokens is the division operator
versus regular expression syntax in JavaScript. But others are
keywords that can also appear as property names, and the bitwise right
shift operator (>>
) versus two closing angle brackets in C++.
Lezer will not complain about overlapping tokens if the tokens do not appear in the same parse states. This implicitly resolves the regular expression and property name issues, without any user interaction.
When conflicting tokens do appear in the same place, such as division operators and C-style comments, you have to specify an explicit precedence ordering (comments take precedence) to tell the tool that you know what you're doing.
Contextual tokenization is implemented with a concept called token groups. Tokens that have unresolved conflicts with other tokens are assigned to one or more groups, where each group contains only non-conflicting tokens. Each state is assigned a single group (if it expects tokens that conflict with each other that's an error). This group is passed to the tokenizer, which then takes care to only return tokens that are either in that group, or don't conflict with any other tokens. The check is optimized by storing group membership in a bitset, and seeing if the right bit is set with binary and.
Tokens are compiled down to a single deterministic state machine, which is ran on the input character stream. In cases like the regexp-versus-division issue, you don't want the machine to go running through regexp-specific states in a situation where you only allow division, since that would be wasteful. Therefore, each tokenizer state is also tagged with a bitset that tells you which groups the tokens reachable from that state belong to, and the tokenizer stops running when it hits a state that has no overlap with the allowed tokens for the parse state.
Almost all programming languages have special syntactic elements like whitespace and comments that may occur between any tokens. Encoding these directly in the grammar is extremely tedious for most languages.
Traditionally, tokenizer just skip such elements when reading the next token. That works well in most contexts, but makes it awkward to include the elements in the parse tree.
Lezer treats skipped things like they are part of the grammar (though in an optimized way to avoid increasing the size of the parse tables). It is possible to skip things that aren't single tokens (to implement something like nestable comments, for example, or to make sure your block comment nodes consist of smaller nodes so that you can incrementally parse giant block comments).
Each rule or group of rules may have its own set of skipped expressions, so that you can express different sublanguages in a grammar, for example something like the content of interpolated strings, without allowing spacing in places where the language doesn't allow it.
Each parse state has a pointer to a (shared) set of skip actions, which, for the skipped tokens or tokens that start a compound skipped expression, contains the actions to take for those tokens. For single-token skipped elements, that action just tells the parser to skip the token and stay in the same state. For compound elements, it causes the state that handles the rest of the element to be pushed onto the control stack.
The languages that a tool like Lezer needs to handle are wildly different, from JavaScript to Haskell to CSS to YAML. As such, it is difficult to find a cross-language vocabulary to describe their constructs. In fact, it seems like that would be a separate multi-year project, and pull in a serious amount of complexity.
Yet it would be nice if the parser output comes with some information that can be interpreted without knowing what language you are working with.
After several iterations, what I decided on was a system where nodes have names, which only have a meaning within the language, and props, which are values associated with tags defined by external code. Integrating a language grammar into CodeMirror involves assigning values for some of these props to the node types used by the language—things like syntax highlighting style information and how to indent such nodes.
Since the number of node types in a language is limited, we can allocate an object for each node type to hold this information, and have all nodes of that type point to the same object.
To allow code outside the grammar to add props without mutating global state, parser instances can be extended with additional props, creating a copy that will output nodes with the props attached. This is especially useful in the context of mixed-language trees.
Lezer has support for a limited form of grammar nesting. If language A can appear inside a document in language B, and the end of the region covered by A can be unambiguously found by scanning for a specific token, Lezer can temporarily switch to another set of parse tables while parsing such a region.
The syntax tree will then contain nodes from both grammars. Having props directly attached to the nodes makes it much easier to work with such trees (as opposed to using a language-specific table that associates node names with metadata).
]]>This post is for you if, in the course of that year, you've occasionally wondered what the hell Marijn is doing and if he's getting anywhere at all. I have been absolutely terrible about communicating progress, and even monitoring the repository would often leave you in the dark, as I was working on local branches or entirely different repositories. In any case, the code that's in there is almost entirely undocumented.
Last week, I landed a major set of changes, which had been in the works for about four months. They integrate a new approach to code parsing. This was the last piece of the system that was completely in flux, where I didn't want to nail down anything related to it yet because I wasn't sure how it would end up working.
Now I am sure, which means that the vision for the system as a whole just became a lot more clear.
Apart from the parser work, a lot of design and cleanup has happened in the past year. I'll go over the major elements at the end of this post.
For observers, the more interesting thing is that we finished a doc comment extractor for TypeScript, and are starting to work on adding enough comments to generating a passable reference manual for the new system. Hopefully, that will finally allow outsiders to get a clear view of what we're building.
I intend to start releasing npm packages around that time. They won't be stable or feature-complete. They won't even reflect the package organization of the final system. But they should provide an easy way to start playing with the system.
I'm a bit behind where I had hoped I would be at this point. This has three reasons:
My other projects and customers kept demanding attention. How rude. Fortunately, this means that the CodeMirror 6 budget will stretch longer, since the time spent on other things generated its own income.
The parser system turned out to be Really Hard, and took a bit longer than hoped.
I'm definitely suffering from second system syndrome, where after eight years with the old system, I am acutely aware of its limitations and want to fix them all this time around. That sometimes leads me into overly-ambitious design rabbit holes, and then it takes me a week or two to dig myself out again.
So the bad news is that a stable release is still quite a ways off (not going to make any more specific predictions at this point). Some parts of the system are at a point where they may roughly stay the way they are, but other significant parts haven't even been written yet.
These are the major pieces of work that have happened since the project was announced...
Handling composition/IME (input method editor, as used by people whose script has too many characters to fit on a keyboard, but also pretty much all Android virtual keyboard input) in the browser is its own special kind of hell.
During composition, the user interface is in a special mode where a piece of the editable text is being composed. What exactly that means differs per input method, but it usually involves step-by-step updates through pressing additional keys and/or interacting with menus to reach the intended input string.
In normal operation, CodeMirror will constantly sync the content in the DOM with its model of that content—applying highlighting and normalizing DOM elements that don't have the expected shape. However, during composition, if you mess with the DOM around the composition, or with the selection, you will abort the composition, making your editor pretty much unusable to IME users and, on some browsers, causing bizarre side effects such as their composed text being duplicated again and again on every keystroke.
Our old approach was to freeze interface updates during composition, stepping well back and letting the user do their thing until the composition ended, at which point the editor sprang to life again. That led to a number of responsiveness issues, where editor plugins, such as autocomplete, wouldn't be kept up to date on what is happening, causing the interface to appear frozen or outdated.
In CodeMirror 6, I implemented a subtle middle road where the part of the document that is being composed is left alone as long as its content isn't changed by outside code, but the editor's update cycle proceeds as normal, and the part of the document outside of the composition can be changed (say, by collaborative editing, or by the autocompletion plugin updating its list of completions) as normal.
A design where a generic core provides a platform for all kinds of features to be implemented in plugins has to, in a way, provide an extendable extension mechanism, where extensions can define new ways in which other extensions can add or configure behavior. For example, an autocompletion plugin must be able to provide a way for other plugins to register completion sources.
Designing this is tricky, but I think we landed on a nice architecture. I've written another blog post outlining our design. This is working well so far, and has allowed us to create all kinds of features (from the undo history to syntax highlighting to line number gutters) outside of the core library.
Since the browser platform's CSS support is still more or less completely unhelpful when it comes to modularized system, we've decided to use a CSS-in-JS approach where extensions can define their own styles from within JavaScript code and make sure they are included in the document or shadow DOM where the editor is placed.
The place where pure code (around the immutable editor state abstraction) stopped and imperative code (around the DOM and the editor view) started hadn't really been properly defined in the first iteration.
A pain point was viewport state—information about which part of the document is currently visible. This isn't part of the core editor state, but many UI extensions need to access it in order to operate, usually because, as an optimization, they only act on the visible code.
I've added a new abstraction, view fields, for pieces of state that live in the view and affect things like decorations (styling and widgets in the content) and the attributes of the editor's wrapper DOM nodes.
These can be written in bog-standard imperative style, if you want, but still make it easy to handle editor state changes in a disciplined way—they are notified each time something changes, and provided with a full description of the update, including the viewport information and the transactions that were applied.
Block widgets are a way to insert a block element into the document, either on a line boundary, in the middle of a line, or covering some content.
These existed before but I completely reimplemented them (and the rest of the decoration system) at the start of the year to fix some corner case issues in the old implementation, cleaning up a number of issues and limitations in the display code.
Interestingly, allowing block widgets in the middle of lines, which wasn't initially part of the plan, turned out to be easier than forbidding them, due to the way decorations interact. Another decoration could hide the newline next to a block widget, something the initial implementation could not deal with gracefully.
The first demo came with a line number gutter, but no way to create gutters for other purposes. I generalized the gutter plugin so that is now possible to create your own custom gutters and dynamically add or change the content that is displayed in them.
We wrote a system that uses the TypeScript library to extract both doc comments and types from a TypeScript project, and output them as a JSON structure. Feeding this into the builddocs tool allows us to build good-looking, interlinked reference documentation directly from the source code.
The system is already being used to generate Lezer's reference docs, and we're working on applying it to CodeMirror.
This is the biggest one, and kept me busy for most of the spring and summer. It was clear that we'd want to move beyond CodeMirror 5's primitive syntax analysis system, but it wasn't clear how.
That is, until I saw tree-sitter, which is a practical implementation of an incremental LR parser, giving you a real syntax tree for your code and updating it cheaply when you make changes. It is used in the Atom editor.
Unfortunately, tree-sitter is written in C, which is awkward to run in the browser (we're still targeting non-WASM browsers). It also generates very hefty grammar files because it makes the size/speed trade-off in a different way than a browser-based system would.
Thus, I set out to clone tree-sitter in JavaScript. And because I always feel I know everything better, I didn't exactly clone it, but rather built a different system inspired by it. That system is Lezer, an incremental parser generator and runtime system for JavaScript.
This was a relatively big project. And then, when I started integrating it with CodeMirror 6, I was forced to go back to the drawing board several times to work out issues.
But the system turned out well. If a grammar has been written for your language (we have JS, CSS, HTML, and XML so far), CodeMirror can keep a syntax tree of your document as you are editing it. This tree is used for (more accurate) highlighting, indentation, folding (soon) and has a host of other potential uses.
]]>Even with the cleverest tools, there isn't always a single definite indentation for a given line. In a language like Python, whether a line should continue the current block or dedent to some parent block isn't obvious from the context. Inside multi-line strings or comments, an editor can only guess what the user is trying to do. But even there, having a guessed indentation that you may need to correct is usually more convenient than having to manually indent every line.
In the process of maintaining CodeMirror, I've written a lot of indentation logic for a lot of languages. Initially directly on top of the source string, by applying heuristic regexps to the ends of previous lines (which was as fragile as it sounds). Then from data left behind by the glorified tokenizers CodeMirror used for highlighting, which worked better, but required those tokenizers to do extra indentation-specific work.
CodeMirror (in version 6) is moving towards building real syntax trees for its content. A big part of the reason that many indenters are ugly hacks is that they usually don't have access to the information they need, at least not in a convenient format.
Syntax trees are that convenient format.
Most forms of indentation fall into two categories. They either indent relative to the indentation of some reference line, or they align to some specific position.
The first is the most common. If you have this code, the function arguments are indented relative to the start of the line where the argument list starts.
call(
one,
two)
Because (outside of Lisp) you tend to have different indentation styles for different constructs, it makes sense to associate them with syntax node types. In the example above, we'd be applying the indentation style for argument lists.
To go from a line in a parsed document to an indentation, you first figure out the strategy that applies at this point, which you get from the innermost node that wraps the start of the line and has an indentation style associated with it.
if (x) {
A
B
}
That node may span multiple lines, for example in a block like the one above, when indenting the statements on lines 2 or 3, the wrapping node is the block node. That node's indentation style might say “indent one unit beyond my start line”. Because the block starts on line 1, with indentation 0, we indent the statements a single unit.
The start of line 3 is also inside the block. But because the line
closes the block, it should not add indentation. Many node types have
some constructs that they shouldn't indent—a closing token is one, but
the else
token in an if statement, or a hanging {
in a K&R-style
function definition, are others.
When the content of a bracketed node starts immediately after the opening bracket, many styles align subsequent lines with the opening bracket.
call(one,
two)
Here the indentation isn't relative to the construct's start line, but to a specific token.
Some constructs, such as multi-line strings or block comments, can indicate that they shouldn't be indented.
When I claimed that nodes compute indentation relative to the line on which they start, I was oversimplifying. If you have code like this:
function foo(a, b,
c, d) {
body()
}
The body block probably shouldn't be indented past the c
. Yet it
does start on line 2, which is indented 13 spaces.
When figuring out the reference line for a given node, you have to check whether that line starts inside another node (that is not a parent node of the target node), and if so, continue looking at the start of that node.
So in the example, we'd notice that line 2 starts inside a parameter list, which isn't a parent of the body block, and thus use line 1 as reference line.
Unfortunately, not all nodes should be skipped like this. For example,
in this code the member expression foo.bar
would match the
description of a non-parent node covering the start of the line, if
the syntax tree is structured like call_expr(member_expr(...), arg_list(...)
.
one.
two(
three)
But we do probably want to indent the argument list relative to line 2. So this requires us to either use heuristics (such as only skipping bracketed nodes) or some extra language-specific information telling us which nodes must be skipped.
CodeMirror's new indenter requires language packages to attach indentation information to syntax tree node types. That way, even if the document contains multiple languages, it can figure out an indentation directly from the tree.
This information takes the form of a function that computes an indentation depth, given some contextual information like the syntax tree, the target node, the document, and the size of an indentation unit.
Being a function, it can be as complicated as it needs to be. There are helpers for defining functions that support common styles, such as bracketed nodes that align content to their opening bracket when that isn't the last token on the line.
Because content is often in a syntactically invalid state when
indentation is requested, the indenter takes care to “enter”
unfinished nodes in front of the cursor when determining the context
for the indentation. For example, if you press enter after this
unfinished if
statement...
if (x < y &&
The syntax tree will look something like...
if_statement(
paren_expr(
binary_expr(binary_expr(...), incomplete),
incomplete),
incomplete)
Where incomplete
means that the node was cut off by the parser's
error-correction feature. The start of the next line is outside of all
this, but because we know that the node before it was incomplete, we
can indent as if we are in that node—so that puts us inside of
binary_expr
, which may not have an indentation style. But
paren_expr
probably does.
if (x < y &&
// continue here, aligned to the (
]]>It has become fashionable to structure big systems as a number of separate packages. The driving idea is that it is better to, instead of locking people into your implementation of a feature, provide the feature as a separate package that they can load alongside the core system.
This gets you, roughly...
The ability to not even load features you don't need, which is especially helpful in client-side systems.
The possiblity of replacing functionality that doesn't serve your purpose with another implementation. This also reduces the pressure on the core modules to cover every possible use case.
A reality check for the core interfaces—by implementing basic features on top of the client-facing interface, you are forced to make that interface at least powerful enough to support those features, making sure that things just like them can be built with 3rd-party code.
Isolation between parts of the system. Contributors just have to look at the package they are interested in. Packages can be versioned, deprecated, or replaced without impacting the core.
The cost of this approach is mostly one of complexity. You can provide a batteries-included wrapper package to get users started, but at one point they will probably have to remove the cover and start installing and configuring specific helper packages, which tends to be harder than flipping on a feature in a monolithic library.
This post will try to explore designs for extension mechanisms that support “extension at scale” and unanticipated new extension points.
What do we want from an extensible system? Firstly, of course, it has to allow external code to extend its behavior.
But that is hardly enough. Let me illustrate with an anecdote about a stupid thing I did at some point. I work on editor software. An early version of a code editor allowed client code to set the style of a given line. This was great—now you can selectively style a line.
Except that, as soon as two independent pieces of code try to style a line, they will step on each other's toes. The second extension to touch the line will override the first extension's style. Or, when the first one tries to remove its styling at some later point, it will instead clear the second one's style.
The solution was to make it possible to add (and remove) styling, instead of setting it, so that two extensions can interact with the same line without sabotaging each other.
More generally, you have to make sure that extensions can be combined, even if they are entirely unaware of each other's existence, without causing problematic interactions.
To do this, each extension point must support being acted on by any number of actors. How multiple effects are handled differs by use case. Some strategies that may make sense are:
They all take effect. For example when adding a CSS class to element or show a widget at a given position in the text, you can just do all of them. Some kind of ordering is still often needed, though: The widgets need to be shown in a predictable, well-defined order.
They form a pipeline. An example of this would be a handler that can filter changes to the document before they are applied. Each handler gets fed the change the handler before it produces, and can further modify it. Ordering is not essential here, but can be relevant.
A first come, first served approach can be applied to, for example, event handlers. Each handler gets a chance to handle the event, until one of them declares that they have handled it, at which point the ones behind it in line don't get asked anymore.
Sometimes you really need to pick a single value, such as to determine the value of a specific configuration parameter. Here it may make sense to use some operator (say, logical or, logical and, minimum, or maximum) to reduce the inputs to a single value. For example an editor might enter uneditable mode if any extension tells it to, or the maximum document length might be the minimum of the values provided for that option.
With many of these, ordering is significant. That means that the precedence with effects are applied should be controllable and predictable.
This is one of the places where imperative, side-effect based
extension systems tend to fall short. For example, the browser DOM's
addEventListener
operation will cause event handlers to be called in the order in which
they were registered. This is fine if a single system controls all the
calls, or if the ordering happens to be irrelevant, but when you have
multiple pieces of software independently adding handlers, it can
become very hard to predict which ones will be called first.
As a concrete example, I applied the modular strategy for the first time in ProseMirror, a rich text editor system. Its core is pretty much completely useless on its own—it relies on additional packages to describe the structure of documents, to bind keys, to provide the undo history. Though the system is a bit challenging to use, it has seen adoption in systems that need to customize things that classical editors don't allow you to customize.
ProseMirror's extension mechanism is relatively straightforward. When creating the editor, the client code specifies a single array of plugin objects. Each of these plugins can influence various aspects of how the editor works, doing things like adding bits of state data or handling interface events.
All these aspects have been designed to work with an ordered array of configuration values, using one of the strategies outlined in the previous section. For example, when you specify multiple key maps, the ordering in which you specify the instances of the keymap plugin determines their precedence. The first keymap that knows how to handle a given key gets it.
This is usually powerful enough, and people have been making good use of it. But at a certain level of extension complexity it becomes awkward.
If a plugin has multiple effects, you have to either hope that they all need the same precedence relative to other plugins, or you have to split it into smaller plugins to be able to arrange the precedences correctly.
Ordering plugins becomes very finnicky in general, because it's not always clear to end users which plugins interfere with which other plugins when given a higher precedence. Mistakes tend to only manifest themselves at run-time, when using specific functionality, making them easy to miss.
Plugins that build on other plugins have to document that fact, and hope that users will remember to include their dependencies (in the right place in the ordering).
CodeMirror version 6 is a rewrite of the code editor by that name, in which I'm trying to take the modular approach further. This requires a more expressive extension system. Let us go over some of the challenges involved in the design of such a system.
It's not hard to design a system that provides full control over the ordering of extensions. It is pretty hard to design such a system which is pleasant to use and allows you to combine independent extension code without a lot of finnicky manual intervention.
One tempting solution, when it comes ordering, it so work with
precedence values. An example of this is
z-index
property in CSS, where you specify a number that determines where an
element is placed in the depth stack.
As the comically large
z-index
values that one often finds in style sheets illustrate, this
way of specifying precedence is problematic. A given module, in
isolation, doesn't know which precedences other modules are
specifying. The options are just points on an undifferentiated numeric
range. It can provide a huge (or deeply negative) value in the hope of
hitting one of the far ends of the scale, but everything else requires
guesswork.
This can be made somewhat better by defining a limited set of well-labeled precedence categories, so that extensions can classify the general “level” of their precedence. You still need some way to break ties within categories.
As I mentioned before, once you start heavily relying on extensions you might have some extensions making use of other extensions. Manual dependency management doesn't scale very well, so it would be nice if you could pull in a group of extensions at once.
But, besides making the ordering problem even more pressing, this introduces another issue: Multiple extensions might depend on a given extension, and if extensions are represented as values, you can easily end up loading the same extension multiple times. For some types of extensions, such as keymaps or event handlers, this is okay. For others, like an undo history or a tooltip library, this would be wasteful or even break things.
Thus, it seems that allowing extensions to be composed forces some of the complexity of dependency management onto your extension system. You'll need to be able to recognize extensions that shouldn't be duplicated, and load only one instance of them.
But since, in most cases, extensions can be configured, and thus not all instances of a given extension are the same, we can't just pick one instance and use that—we have to somehow merge those instances in a meaningful way (or report an error when this isn't possible).
Here I'll outline what we're doing in CodeMirror 6. I'm presenting this as a sketch of a solution, not The Definitive Solution. It is entirely possible that this system will further evolve as the library is stabilized.
The core primitive in this approach is called a behavior. Behaviors are the things that extensions can extend by providing values. An example would be the state field behavior, where extensions can add new fields by providing a field description. Or the browser event handler behavior, where extensions can add their own handlers.
From the point of view of a behavior consumer, behaviors, as configured in a specific instance of the editor, provide an ordered sequence of values, with the higher-precedence ones coming first. Each behavior has a type, and the values provided for it should match that type.
A behavior is represented as a value, which is used both to declare an instance of the behavior and to access the values the behavior has. The library comes with a number of built-in behaviors, but external code can define its own. For example, the extension that defines a line number gutter could define a behavior that allows other code to add additional markers to that gutter.
An extension is a value that can be used to configure the editor. An array of them is passed on initialization. Each extension resolves to zero or more behaviors.
The simplest type of extension is simply an instance of a behavior. When you specify a value for a behavior, it returns an extension value that produces that behavior.
A sequence of extensions can also be grouped into a single extension. For example, an editor configuration for a given programming language might pull in several other extensions, such as a grammar to parse and highlight the language, information about how to indent it, and an autocompletion source that intelligently provides suggestions for that language. So you'd have a language extension that just collects these relevant extensions and groups them together into a single extension value.
A simple version of this system could stop here, and just flatten out the nested extensions into a single array of behavior extensions. These could then be grouped by behavior type, and there you have your ordered sequences of behavior values.
But we still need to address deduplication and provide more control over ordering.
A third type of extension value, unique extensions, are the mechanism for deduplication. Extensions that don't want to be instantiated twice in a single editor provide an extension of this kind. To define one, you must provide a spec type, which is the type of configuration value that the extension constructor expects, and an instantiation function, which takes an array of such spec values, and returns an extension.
Unique extensions somewhat complicate the process of resolving a collection of extensions into a set of behaviors. As long as there are unique extensions in the flattened set of extensions, the resolver must pick one type of unique extension, collect all instances of it, call its instantiation function with their specs, and replace them with (a single instance of) the result.
(There's another catch, in that these must be resolved in the right order—if you first resolve unique extension X, but then later resolving Y yields another X, that would be wrong, since all instances of X should be combined together. Since extension instantiation is pure, the system handles this by trial-and-error, restarting the process—and recording information about what it learned—when it runs into this situation.)
Finally, we must address precedence. The basic approach is still to preserve the ordering in which the extensions were provided. Compound extensions are flattened into that same ordering at the position where they occur. The result of resolving a unique extension is inserted at its first occurrence.
But extensions can assign some of their sub-extensions to a different precedence category. The system defines four such categories: fallback (take effect after the other things), default, extend (higher precedence than the bulk), and override (should probably be on top). Actual ordering happens first by category, then by original position.
So an extension that has a low-priority keymap and a regular-priority event handler could give you a composite extension built out of the result of the keymap extension (without needing to know what behaviors that is made up of) with its priority set to fallback, plus an instance of the event handler behavior.
The way extensions can be combined without worrying about what they are doing internally seems a major win. In the extensions that we've modeled so far, which include a two parsing systems that expose the same syntax behavior, a syntax highlighter, a smart indentation service, the undo history, a line number gutter, bracket matching, keymaps, and multiple selections, it seems to work well. It also anticipates some of the problem that I ran into in ProseMirror, though no one has actually built complex enough setups with this system yet to run into them.
There are a few new concepts that users need to grasp to be able to use this system, and it is decidedly harder to use than the imperative systems that are traditional in the JavaScript community (call method to add/remove effect). Having extensions properly compose seems to provide enough value to offset that cost.
]]>ProseMirror is a Web interface component, and though some of the challenges it tackles are specific to the strengths and (especially) weaknesses of the Web platform, don't think of it as another TinyMCE alternative. Rather, it is a more general take on rich text editing that happens to be implemented in JavaScript for the browser.
Most importantly, ProseMirror is agnostic to the actual document shape, making it possible to build applications on top of this library that in the past would have required a fully custom editor implementation.
What I mean by being agnostic to document shape is ProseMirror's schema feature. The core editor has no built-in opinion about what a document looks like, and instead looks at a piece of configurable data (the schema) to figure out what kind of content is allowed and how it is structured. ProseMirror will work with precisely the custom semantic document format that you need, while still giving you the WYSIWYG style of editing that users are used to.
For example, a scientific writing app could use a schema that includes sections, footnotes, and references—two such apps, SciFlow and Fidus Writer have been built on top of ProseMirror. Or a news organization could build a schema that reflects their content model, to provide an editor for journalists to write in. For example, The New York Times is using ProseMirror in its CMS. Or if your company has editors for a number of differing content models, using ProseMirror with different schemas can make it easier to unify your editor code. Atlassian is rolling out ProseMirror across their products, ranging from wiki to bug tracker to source hosting.
Support for collaborative editing has been a focus in ProseMirror from the start. Several aspects of the system, such as the way document updates are represented, or the way the undo history module works, have been strongly influenced by the requirements of collaborative editing. I've become convinced that this is not a feature you can robustly bolt onto an existing rich text editor.
Fortunately, these constraints, rather than forcing the design into an uncomfortable corner, helped push it in a generally beneficial direction. Several other tricky applications, such as change tracking and the ability to roll back past changes, were made possible by design decisions made for collaborative editing-related reasons.
Trying to combine the requirements of collaborative editing with a functional unidirectional data flow architecture led us to a design where the editor, instead of unilaterally updating its state, emits transactions. A transaction can be used to compute a new state with which to update the editor.
This makes it possible to almost seamlessly integrate the editor in your application's data flow cycle if you want to. In addition, having updates as first-class values makes it much easier to keep external state in sync with the editor, which allows new, powerful types of extensions.
After years of wild experiments and constant change, starting with the 1.0 release we are aiming for stability. The central modules will stay on 1.x for as long as possible, which means new releases won't require you to change your code. There's an RFC process that we'll use to get community feedback on new features
If you're looking for a simple drop-in rich text editor component, ProseMirror is probably not what you need. (We do hope that such components will be built on top of it.) The library is optimized for demanding, highly-integrated use cases, at the cost of simplicity. But if your app is pushing the limits of what has been possible with WYSIWYG editors so far, take a look.
]]>You can question whether ECMAScript 6 is worth incurring the inconvenience of a build step. I think it is. The various syntactic niceties provided by that dialect, mostly arrow functions and class syntax, make my code feel cleaner.
And even without ECMAScript 6, to run modularly-written things in a browser we'd still traditionally require a bundling step.
Let's do away with that first.
Bundling tends to be the thing at the very end of the pipeline. You
have a “root” script, and your bundler follows all its dependencies,
dumps them into a single big file, with some magic glue that makes the
right module pop up in the right place. You then refer to this bundle
from a <script>
tag.
One thing about bundling is that it is monolithic. It requires the whole bundle file to be rewritten every time one of your modules changes. This can be slow.
For development, do we need a bundler? If we're using AMD-style modules, we don't. But I don't like the cruft created by AMD-style modules and very little NPM modules are written in that style. If we have a directory of CommonJS-style modules, possibly using some NPM-installed dependencies, what would it take to run those directly in the browser?
A client implementation of CommonJS modules (require
, exports
,
etc)
A way to “resolve” dependencies (going from "foo"
to
"../node_modules/foo/src/index.js"
, etc).
Fast access to the modules and some really good caching, since each page load is going to load dozens to hundreds of files.
Enter moduleserve, a small web server shim that does the node-style module resolution, serves up a client-side CommonJS implementation, and does a good job caching modules (and helping the browser cache them).
You say moduleserve mydir
, and it'll serve up the content of the
mydir
directory on localhost:8080
. You can have an HTML file in
that that includes...
<script src="/moduleserve/load.js" data-module="./root"></script>
... and it will load up moduleserve's CommonJS implementation, and run
the equivalent of require("./root")
in it.
The implementation uses synchronous HTTP requests to fetch modules.
[stunned silence]
Browsers will show a warning in the console, but it's okay, as long as you're testing from the same machine. I can reload ProseMirror's test suite (150 modules) in 800 milliseconds (when they are cached).
So now, for a project that doesn't need any further compilation steps, we have a completely compilation-less way to try it out and run its tests during development. And it's not even slow.
You can set up moduleserve to compile your sources using Babel (and cache the compiled results), but there's a better way.
I find that I don't just need my compiled source files in one context. I need them to run my tests in the browser, to run them in node, as input for my bundler when building for production, as dependency code for other modules that I'm developing in sync, and so on.
And I don't want to set up a different compilation pipeline for each of these uses. That's a lot of duplicate work, both from me to set it up, and from the computer that is compiling everything ten times.
So I can set up a canonical dist/
directory where the compiled files
go, and have Babel, or whichever compiler, watch
my source files and constantly recompile when they change. That works
relatively well, but annoyed me for two reasons:
I'd often read the files before the recompilation finished, leading to me getting very confused because I was drawing conclusions from the behavior of outdated files.
I'm often working on my laptop on battery power, and, since I am constantly saving files as I work on them, the amount of CPU and disk traffic caused by the constant needless recompilation bothered me.
So I wrote distfs, another small helper module. This one “mounts” a source directory, using a userspace file system to expose it as a normal-looking compilation output directory. Every time you try to read a file from that directory, the actual source is compiled, and you get the compiled output. This is cached, of course.
The advantage is that this is pull-based. Files are only compiled as needed, and when you access a file that isn't ready yet, your read blocks until it is, so that you're always sure you have the up-to-date content.
Putting those two together, I'm a much happier ECMAScript 6 developer than I was before.
Let's make a project. A simple one, consisting of two ECMAScript 6 files, an HTML file, and a single dependency. It'll show a simple form into which you can type JavaScript code, after which it will tell you whether that code is valid.
mkdir project
cd project
mkdir src dist www
# development tools
npm install babel-core babel-preset-es2015 moduleserve distfs
# actual dependency, a JavaScript parser
npm install acorn
# configure Babel
echo '{"presets": ["es2015"]}' > .babelrc
The www/
dir will be the one we mount with moduleserve. We put an
index.html
file like this in it:
<form>
<textarea name=code>1 + 1</textarea>
<button type=submit>Check</button>
</form>
<pre id=output></pre>
<script src="/moduleserve/load.js" data-module="../dist/main"></script>
We'll put the scripts in src/
. This is src/main.js
:
import {checkCode} from "./check-code"
document.querySelector("form").addEventListener("submit", e => {
e.preventDefault()
document.querySelector("#output").textContent =
checkCode(e.target.elements.code.value)
})
And this is its dependency, check-code.js
, which in turn uses our
installed acorn
module:
import {parse} from "acorn"
export function checkCode(code) {
try { parse(code); return "OK" }
catch (e) { return e.toString() }
}
Now, let's fire up out helper processes.
node_modules/.bin/distfs src dist &
node_modules/.bin/moduleserve www --port 8090 &
At this point, localhost:8090
shows our
test page. When you edit one of the scripts and reload, you get the
updated (compiled) code. When something crashes, you get a
source-mapped stack trace pointing at the correct line numbers in the
actual files.
If that still doesn't make a lot of sense, see their website or read Martin Kleppmann's excellent description.
Being a resident there meant that I was simply present in the space for a week, gave a talk, and worked with people on their projects. Working with experienced programmers is a way to diffuse some of the knowledge of those programmers, as well as a good way to make people realize that those who are considered good programmers still have to constantly google for stuff and make stupid mistakes.
The environment you'll find in the Recurse Center office is an atypical one, for the tech world.
People are working on whatever interests them, not necessarily on useful stuff. Which means mostly fun projects that are easy to relate to.
The group is amazingly diverse, in a number of ways (more on this below).
There are social rules in place to actively combat alpha-techy behavior and competitive bullshit. Not having to worry about proving how great you are makes it a lot easier to learn.
Events like “check-ins” (a group of people telling each other what they are working on) and lightning talks make it easy to find out what other people are working on, and facilitate the flow of information between people.
The result I can only describe as magic. People are getting things done, profiting from each other's expertise, doing stuff they never did before, and being immersed in (mostly the good parts of) tech culture.
In my week, I pair-programmed with about ten different people. Some of these sessions didn't really go anywhere, but a lot of them did. These are some of the things we did:
Tweaked the structure of a functional-style space invaders game to be cleaner and more actually functional. Major insight: moving triggers for sound effects from deep inside the game logic to a separate piece of code that compared the state before and after the frame, and played the appropriate sounds based on that.
Worked on a small Lisp interpreter in Python. Moved it from a
Norvig-style split
-on-spaces
parser to a more classical tokenizer and parser.
Laid the groundworks for a tower defense game in JavaScript, experimenting with Electron and explaining the basics of npm in the process.
Worked on another small Lisp interpreter in C++. Implemented a pointer-tagging data representation, a parser, pretty-printer, and rudimentary evaluator. (I hadn't touched C++ in years, and it took us a bunch of segfaults to get a reverse iterator right.)
Wrote a simple virtual machine (the “little man computer”) first in Haskell and then in Python. Both ran example programs by the time we were done with them. The Haskell version included an assembler that went from textual programs to machine instructions. Reminded me how amazing Haskell programming can be.
Had a study/discussion group on the Raft consensus algorithm. Which was a good reason to actually read the paper. I came away with a better understanding of distributed consensus, and I hope others also picked something up.
Wasted an afternoon attacking a particularly horrid bug in the interaction between a particular node version and the Nock library. It was caused by irresponsible monkeypatching on the part of Nock. This was probably the session that most resembled normal programming—lots of debugging, not much progress.
Looked at an implementation of the WOOT algorithm for collaborative text editing on top of CodeMirror. Most of the session was spent getting me to understand how WOOT works, but we did make some progress.
So that's definitely more cool programming stuff than I get to do in an average week.
I mentioned that the crowd at Recurse Center is amazingly diverse. This is part of the center's focus, and they are doing a great job on it. The participants there, as well as the organizers, are more representative of society as a whole than any other tech group I've been around.
And that diversity works. It, along with the healthy social framing provided by the organization, creates a social atmosphere very different from your typical young-white-guy tech environment. There was no emotional vacuum. I didn't have to cringe at terrible or insensitive jokes. People weren't one-upping each other. There was no assumption of cultural homogeneity.
I do understand that diversity alone doesn't necessarily produce such an effect. I've worked in offices that were diverse by the numbers, but where the culture was still poison. You also need a healthy dose of political awareness. And you need to make sure power isn't concentrated in a specific group. And so on. Healthy culture is hard work. Thank you, Recurse Center organizers, for doing this work.
]]>A real-time collaborative editing system is one where multiple people may work on a document at the same time. The system ensures that the documents stay synchronized—changes made by individual users are sent to other users, and show up in their representation of the document.
Since transmitting changes over any kind of network is going to take time, the complexity of such systems lies in the way they handle concurrent updates. One solution is to allow users to lock the document (or parts of it) and thus prevent concurrent changes from happening at all. But this forces users to think about locks, and to wait when the lock they need is not available. We'd prefer not to do that.
If we allow concurrent updates, we get situations where user A and user B both did something, unaware of the other user's actions, and now those things they did have to be reconciled. The actions might not interact at all—when they are editing different parts of the document—or interact very much—when they are trying to change the same word.
A lot of research has gone into this problem. And I must admit that, though I did read a bunch of papers, I definitely do not have a deep knowledge of this research, and if you find that I misrepresent something or am missing an interesting reference, I am very interested in an email that tells me about it.
A lot of this research is about truly distributed systems, where a group of nodes exchange messages among themselves, without a central point of control. The classical approach to the problem, which is called Operational Transformation, is such a distributed algorithm. It defines a way to describe changes that has two properties:
You can transform changes relative to other changes. So if user A inserted an “O” at offset 1, and user B concurrently inserted a “T” at offset 10, user A can transform B's change relative to its own change, an insert the “T” at offset 11, because an extra character was added in front of the change's offset.
No matter in which order concurrent changes are applied, you end up with the same document. This allows A to transform B's change relative to its own change, and B to transform A's change similarly, without the two users ending up with different documents.
An Operational Transformation (OT) based system applies local changes to the local document immediately, and broadcasts them to other users. Those users will transform and apply them when they get them. In order to know exactly which local changes a remote change should be transformed through, such a system also has to send along some representation of the state of the document at the time the change was made.
That sounds relatively simple. But it is a nightmare to implement. Once you support more than a few trivial types of changes (things like “insert” and “delete”), ensuring that applying changes in any order produces the same document becomes very hard.
Joseph Gentle, one of the engineers who worked on Google Wave, stated...
Unfortunately, implementing OT sucks. There's a million algorithms with different trade-offs, mostly trapped in academic papers. The algorithms are really hard and time consuming to implement correctly.
The design decisions that make the OT mechanism complex largely stem from the need to have it be truly distributed. Distributed systems have nice properties, both practically and politically, and they tend to be interesting to work on.
But you can save oh so much complexity by introducing a central point. I am, to be honest, extremely bewildered by Google's decision to use OT for their Google Docs—a centralized system.
ProseMirror's algorithm is centralized, in that it has a single node (that all users are connected to) making decisions about the order in which changes are applied. This makes it relatively easy to implement and to reason about.
And I don't actually believe that this property represents a huge barrier to actually running the algorithm in a distributed way. Instead of a central server calling the shots, you could use a consensus algorithm like Raft to pick an arbiter. (But note that I have not actually tried this.)
Like OT, ProseMirror uses a change-based vocabulary and transforms changes relative to each other. Unlike OT, it does not try to guarantee that applying changes in a different order will produce the same document.
By using a central server, it is possible—easy even—to have clients all apply changes in the same order. You can use a mechanism much like the one used in code versioning systems. When a client has made a change, they try to push that change to the server. If the change was based on the version of the document that the server considers current, it goes through. If not, the client must pull the changes that have been made by others in the meantime, and rebase their own changes on top of them, before retrying the push.
Unlike in git, the history of the document is linear in this model, and a given version of the document can simply be denoted by an integer.
Also unlike git, all clients are constantly pulling (or, in a push model, listening for) new changes to the document, and track the server's state as quickly as the network allows.
The only hard part is rebasing changes on top of others. This is very similar to the transforming that OT does. But it is done with the client's own changes, not remote changes.
Because applying changes in a different order might create a different document, rebasing isn't quite as easy as transforming all of our own changes through all of the remotely made changes.
Whereas OT transforms changes relative to other changes, ProseMirror transforms them using a derived data structure called a position map. Whenever you apply a change to a document, you get a new document and such a map, which you can use to convert positions in the old document to corresponding positions in the new document. The most obvious use case of such a map is adjusting the cursor position so that it stays in the same conceptual place—if a character was inserted before it, it should move forward along with the surrounding text.
Transforming changes is done entirely in terms of mapping positions.
This is nice—it means that we don't have to write change-type-specific
transformation code. Each change has one to three positions associated
with it, labeled from
, to
, and at
. When transforming the change
relative to a given other change, those positions get mapped through
the other change's position map.
For example, if a character is inserted at position 5, the change “delete from 10 to 14” would become “delete from 11 to 15” when transformed relative to that insertion.
Every change's positions are meaningful only in the exact document version that it was originally applied to. A position map defines a mapping between positions in the two document versions before and after a change. To be able to apply a change to a different version, it has to be mapped, step by step, through the changes that lie between its own version and the target version.
(For simplicity, examples will use integers for positions. Actual positions in ProseMirror consist of an integer offset in a paragraph plus the path of that paragraph in the document tree.)
An interesting case comes up when a client has multiple unpushed changes buffered. If changes from a peer come in, all of the locally buffered changes have to be moved on top of those changes. Say we have local changes L1 and L2, and are rebasing them onto remote changes R1 and R2, where L1 and R1 start from the same version of the document.
First, we apply R1 and R2 to our representation of that original version (clients must track both the document version they are currently displaying, which includes unsent changes, and the version that does not yet include those changes). This creates two position maps mR1 and mR2.
We can simply map L1 forward through those maps to arrive at L1⋆, the remapped version of L1. But L2 was based on the document that existed after applying L1, so we first have to map it backwards through mL1, the original map created by applying L1. Now it refers to the same version that R1 starts in, so we can map it forward through mR1 and mR2, and then finally though mL1⋆, the map created by applying L1⋆. Now we have L2⋆, and can apply it to the output of applying L1⋆, and voila, we have rebased two changes onto two other changes.
Except that mapping through deletions or backwards through insertions loses information. If you insert two characters at position 5, and then another one at position 6 (between the two previously inserted characters), mapping backwards and then forward again through the first insertion will leave you before or after the characters, because the position between them could not be expressed in the coordinate space of a document that did not yet have these characters.
To fix this, the system uses mapping pipelines that are not just a series of maps, but also keep information about which of those maps are mirror images of each other. When a position going through such a pipeline encounters a map that deletes the content around the position, the system scans ahead in the pipeline looking for a mirror images of that map. If such a map is found, we skip forward to it, and restore the position in the content that is inserted by this map, using the relative offset that the position had in the deleted content. A mirror image of a map that deletes content must insert content with the same shape.
Whenever content gets inserted, a position at the exact insertion point can be meaningfully mapped to two different positions: before the inserted content, or after it. Sometimes the first is appropriate, sometimes the second. The system allows code that maps a position to choose what bias it prefers.
This is also why the positions associated with a change are labeled.
If a change with from
and to
positions, such as deleting or
styling a piece of the document, has content inserted directly before
or after it, that content should not be included in the change. So
from
positions get mapped with a forward bias, and to
positions
with a backward bias.
When a change is mapped through a map that completely contains it, for example when inserting a character at position 5 is mapped through the map created by deleting from position 2 to 10, the whole change is simply dropped, since the context in which it was made no longer exists.
An atomic change in ProseMirror is called a step. Some things that look like single changes from a user interface perspective are actually decomposed into several steps. For example, if you select text and press enter, the editor will generate a delete step that removes the selected text, and then a split step that splits the current paragraph.
These are the step types that exist in ProseMirror:
addStyle
and removeStyle
add and remove inline styling to or
from a piece of the document. They take from
and to
positions.
split
splits a node in two. It can be used, for example, to split
a paragraph when the user presses enter. It takes a single at
position.
join
joins two adjacent nodes. This only works if they contain
the same type of content. It takes from
and to
positions that
should refer to the end and start of the nodes to be joined. (This
is to make sure that the nodes that were actually intended are
being joined. The step is ignored when another node has been
inserted between them in the meantime.)
ancestor
is used to change the type of a node and to add or
remove nodes above it. It can be used to wrap something in a list,
or to convert from a paragraph to a heading. It takes from
and
to
positions pointing at the start and end of the node.
replace
replaces a piece of the document with zero or more
replacement nodes, and optionally stitches up compatible nodes at
the edges of the cut. Its from
and to
positions define the
range that should be deleted, and its at
position gives the place
where the new nodes should be inserted.
The last type is more complex than the other ones, and my initial impulse was to split it up into steps that remove and insert content. But because the position map created by a replace step needs to treat the step as atomic (positions have to be pushed out of all replaced content), I got better results with making it a single step.
An essential property of real-time collaborative systems is that they try to preserve the intention of a change. Because “merging” of changes happens automatically, without user interaction, it would get very annoying when the changes you make are, through rebasing, reinterpreted in a way that does not match what you were trying to do.
I've tried to define the steps and the way in which they are rebased in so that rebasing yields unsurprising behavior. Most of the time, changes don't overlap, and thus don't really interact. But when they overlap, we must make sure that their combined effect remains sane.
Sometimes a change must simply be dropped. When you type into a paragraph, but another user deleted that paragraph before your change goes through, the context in which your input made sense is gone, and inserting it in the place where the paragraph used to be would create a meaningless fragment.
If you tried to join two lists together, but somebody has added a paragraph between them, your change becomes impossible to execute (you can't join nodes that aren't adjacent), so it is dropped.
In other cases, a change is modified but stays meaningful. If you made characters 5 to 10 strong, and another user inserted a character at position 7, you end up making characters 5 to 11 strong.
And finally, some changes can overlap without interacting. If you make a word a link and another user makes it emphasized, both of your changes to that same word can happen in their original form.
Silently reinterpreting or dropping changes is fine for real-time collaboration, where the feedback is more or less immediate—you see the paragraph that you were editing vanish, and thus know that someone deleted it, and your changes are gone.
For doing offline work (where you keep editing when not connected) or for a branching type of work flow, where you do a bunch of work and then merge it with whatever other people have done in the meantime, the model I described here is useless (as is OT). It might silently throw away a lot of work (if its context was deleted), or create a strange mishmash of text when two people edited the same sentence in different ways.
In cases like this, I think a diff-based approach is more appropriate. You probably can't do automatic merging—you have to identify conflicts had present them to the user to resolve. I.e. you'd do what git does.
How should the undo history work in a collaborative system? The widely accepted answer to that question is that it definitely should not use a single, shared history. If you undo, the last edit that you made should be undone, not the last edit in the document.
This means that the easy way to implement history, which is to simply roll back to a previous state, does not work. The state that is created by undoing your change, if other people's changes have come in after it, is a new one, not seen before.
To be able to implement this, I had to define changes (steps) in such a way that they can be inverted, producing a new step that represents the change that cancels out the original step.
ProseMirror's undo history accumulates inverted steps, and also keeps track of all position maps between them and the current document version. These are needed to be able to map the inverted steps to the current document version.
A downside of this is that if a user has made a change but is now idle while other people are editing the document, the position maps needed to move this user's change to the current document version pile up without bound. To address this, the history periodically compacts itself, mapping the inverted changes forward so that they start at the current document again. It can then discard the intermediate position maps.
]]>Well, that's not really what happens. But the effect is the same. I keep building complex, demanding pieces of code and then giving them away. The actual mechanism is usually that I think of some technical concept, find out that it hasn't been done yet, and through some mix of curiosity and ego, just have to see if I can do it.
Here's the newest damage: ProseMirror, a browser-based rich text editor. Though I'm not giving it away per se, but running a crowd-funder to open-source it, and have thought a bit about how to make after-release maintenance sustainable.
Didn't I just talk about implementing things that have “not been done yet”? And aren't there at least a hundred browser-based rich text editors out there?
Yes, and yes. But none of the existing projects take the approach that
I think would be ideal. Many of them are firmly rooted in the old
paradigm of relying on contentEditable
elements and then trying to
sort of clean up the resulting mess. This gives us very little control
over what the user and the browser are doing to our document.
What do we need control for? For one thing, it makes it much easier to keep the document in a sane state. If the document is only modified by your code, you can define these modifications so that they preserve the invariants you want to preserve, and you can ensure that the same thing happens on different browsers.
But more importantly, it allows you to represent these modification in a more abstract way than “something changed around here, and this is the new document state”. And that is extremely helpful when implementing collaborative editing—to effectively merge conflicting changes from multiple users, it helps to have an accurate representation of the intent of the changes.
ProseMirror does create a contentEditable
element that it displays
its document in. This gives us all the logic related to focus and
cursor motion for free, and makes it much, much easier to support
screen readers and bidirectional text.
Any actual modifications made to the document are captured by handling the appropriate browser events, and converted to our own representation of these modifications. Assuming relatively modern browsers, this is easy for most types of changes. We can handle key events to capture typed text and things like backspace and enter. We can handle clipboard events to make copy, cut, and paste work. Drag and drop are also exposed through events. Even IME input fires relatively usable composition events.
Unfortunately, there are a few cases where browsers don't fire events
that describe user intent, and all you get is an after-the-fact
input
event. This happens when picking a correction from the spell
check menu, for example, and also when using key combinations to type
special characters (for example “Multi-e =” to type “€” on Linux).
Fortunately, all such cases that I have run across so far involve
simple, character-level input. We can inspect the DOM, compare it to
our representation of the document, and derive the intended
modification from that.
When a modification is made the editor's representation of the document is changed, and then the display (the DOM element on the screen) is updated to reflect the new document. By using a persistent data structure for the document—making a change creates a new document object, without mutating the old one—we can use a very fast document-diffing algorithm to make only the DOM updates that are actually necessary. This is somewhat similar to what React and its various spin-offs do, except that ProseMirror works with its own representation of the document, not with a general-purpose DOM-like data structure.
This document representation is explicitly not HTML. It is also a “semantic” representation of the document, and a tree-shaped data structure that describes the structure of the text in terms of paragraphs, headings, lists, emphasis, links, and so on. It can be rendered as a DOM tree, but also as Markdown text, and any number of other formats that happen to be able to express the concepts it encodes.
The outer part of this representation, the part that deals with paragraphs, headings, lists, and so on, is especially DOM-like in its structure—it consists of nodes with child nodes. The content of paragraphs nodes (and other block elements such as headings) is represented as a flat sequence of inline elements, each of which has a set of styles associated with it. This works better than using a tree structure all the way, as the DOM does. It makes it easier to enforce invariants like not allowing text to be emphasized twice, and allows us to represent positions in paragraphs as simple character offsets, which are easier to reason about than positions in a tree.
Outside of paragraphs, we are forced to work with a tree. So a position in the document is represented by a path, which is a sequence of integers denoting the child indices at each level of the tree, and an offset into the node at the end of this path. This is how the cursor position is represented, and how the positions at which modifications should happen are stored.
ProseMirror's current document model mirrors that of Markdown, supporting precisely the things that can be expressed in that format. In the future, you will be able to extend and customize the document model you want to use in a given editor instance.
The editor currently comes with a two styles of user interface. One is the classical toolbar at the top. The other shows tooltips above your selection to do inline-level styling, and a menu button to the right of the currently selected paragraph for block-level actions. I rather like the latter, since it gets out of your way when you are not using it, but I expect many people will prefer the familiar toolbar.
But these interfaces are implemented as modules outside of the editor core, and other interface styles can be implemented on top of the same API.
Key bindings are also configurable, closely following
CodeMirror's model. The functionality that keys are bound to is
available as named commands, and can also be run from scripts using
the execCommand
method.
Finally, there is a module called inputrules
that can be used to
specify that something should happen when text matching a given
pattern is typed. It can be used for things like “smart quotes”, but
also to make a list appear if you type “1.” and press space.
I mentioned collaboration before. A significant amount of the work that went into this project went into making it support collaborative real-time editing. I wrote another blog post about the technical details, but the concepts is roughly this:
When a modification is applied to the document, it creates a new document as well as a position map, which maps positions in the old document to positions in the new document. This is needed to, for example, move the cursor in response to modifications.
Being able to map positions makes it possible to “rebase” modifications on top of other modifications by mapping the position(s) at which they should be applied. There's more to this, and I rewrote this aspect of the system several times before getting it right, but I am pretty confident I ended up with something that works.
In a collaborative scenario, when a client makes modifications, they are buffered locally and sent to the server. If another client submitted its modifications before ours arrive, the server responds with “nope, apply these modifications first”, and the client takes the modifications, rebases its own ones on top of them, and tries again. When they go through, they are broadcast to all other clients, making sure everybody stays synchronized.
Who do I expect to use this?
On the one hand, sites that are using Markdown or some similar format for input might want to offer a more easily learnable interface for their less technical users, and then just convert the result to Markdown.
On the other hand, sites which have been offering traditional rich text input but want control over what comes out might want to move to ProseMirror, since having the editing experience directly reflect and enforce your constraints beats cleaning up messy HTML and hoping for the best.
Finally, I expect the solid support for collaborative rich-text editing will clear out a niche that doesn't really exist yet, allowing people to move some things that they'd currently do on Google Docs into their own products.
Sounds interesting? See how the crowd funding campaign to open-source this is doing.
]]>Fitting copyable things into the capitalist market economy is something of an unsolved problem. The standard model of commerce is to sell our products by the unit. But here, everyone who gets their hands on a single unit can create an infinite number of units, ruining our market.
Thus, we've invented copyright, using the law to try and make the copyable less copyable, and allowing us to go back to the classical model of selling by the unit. Copyright is rather effective at protecting the interests of the producer of the copyable goods—it doesn't fully prevent copying, but it inhibits it enough to allow many authors, musicians, and software houses to turn a profit.
Open source software (and similarly open-licensed works in other media) tilts the balance the other way—it leaves the consumer's ability to copy the works largely unconstrained. This is a way to optimize the usage of the work, the value humanity as a whole gets from it. At the cost, of course, of the ability of the producer to capture value.
I am very happy that the open source concept took off. I personally benefit hugely from being able to use many pieces of great software, without the bother or the financial burden of having to purchase them. I am even more happy that people who aren't as privileged as I am also have access to all this software.
I have spent over 8 years working mostly on open source software. Sometimes I was paid, often I was not. This was possible for me because I have been financially secure my whole life, and giving away software that may or may not help my career at some point in the future was never a seriously risky choice. By the time I started taking on financial responsibilities, my open source career had been bootstrapped well enough that it wasn't risky anymore.
You could say that lack of financial and social responsibilities is the entry ticket to the open source world. And that shows—we're mostly a young white guy club. The open source endeavor is, in a way, missing out on lots of programming talent, both that of people who never are in a position to enter our world to begin with, and that of people who grow up, get a mortgage, and move on to a safe commercial job.
(There are certainly other causes for the lack of diversity in open source, but I think this one is a significant factor.)
What I want, both for long-time maintainers like myself and for people starting out, is a way to get more money flowing into open source. There have been several stories in the news about underfunded, critical projects, but I think the problem affects most projects that aren't directly associated with a company: There are few direct incentives to pay for open source, so even projects with enormous amounts of users often simply don't see enough money to pay for proper maintenance. The slack is sometimes picked up by the aforementioned young people without responsibilities, but that is rarely sustainable.
The effect is that the resulting software isn't as good as it could be and that some niches that would benefit greatly from having a good open implementation simply do not get addressed. In terms of costs and benefits, this is entirely stupid. The money it takes to properly pay a few maintainers is, for most types of software, very small compared to the value created by having that software globally available.
In terms of game theory, on the other hand, this is a rather obvious outcome—one actor's contribution to a project's maintenance is likely to cost that actor more than the direct benefit they receive from the slightly improved maintenance level. Everybody likes good roads, but nobody likes to pay taxes. So everybody sits on their asses and waits for someone else to do something.
But roads can not be copied, whereas software can, so the maintenance work needed to supply people with software does not increase as much per person as the way it does with roads. For projects that have the potential to cater to a lot of users, the economy of scale is very much in our favor. I think we can work this out.
I am about to launch a new project, which is likely to seriously increase the amount of maintenance work I have to do for non-paying strangers. I've been thinking a lot about the way to approach this in the past months. These are the options I've considered...
I could keep all rights to my software and go back to the warm embrace of classical commodity capitalism.
But that'd reduce the impact of my software to a tiny circle of paying customers. I'd have to do marketing to get new customers, rather than the thing simply spreading on its own merit, and lots of people to whom the software would be helpful won't use it because they can't afford it or simply don't know about it. Very unsatisfying.
In this model, I would license the code under the Affero GPL license, which is much like the GPL, but without the loophole that allows you to use such code on a networked server without distributing your code to the users of that server.
This would mostly mean that commercial use of the code would be a lot harder, since the companies using my code would have to open-source much of their own code. The trick would then be to sell licenses to such companies that allowed them to use the code on other terms.
I've seriously considered this approach, but there are two concerns that made me abandon it. Firstly, I expect it would inhibit use of the library almost as much as the closed source model, because GPL licenses are complicated and scary and their legal implications are not terribly well understood.
But more problematically, this model requires me (or my company) to play the role of a central actor. I can only sell licenses if I actually own the copyright over the whole project. That means that only people who explicitly sign away their copyright to me would be able to contribute. And it means that anybody using the code commercially would be entirely dependent on me—if I vanish or quadruple my prices, they have a problem. I personally would be unlikely to use such a project, or to contribute to it.
So that's out as well.
This is a long shot, but it is what I'm going to try. I am legally licensing the code under an MIT license, which is very liberal. Contributors keep the copyright over their contributions on the condition that they license them under the same license, so that the project is a free thing that can be continued and forked without my involvement.
But along with this legal licensing situation, I am going to emphasize, in the docs, the license file, and the communication surrounding the project, that free-loading is not socially acceptable. Along with this, I will provide convenient mechanisms to donate. The code of financial conduct would be something like this:
If you are a non-commercial user, don't worry about it.
If I fix a bug you reported or add a feature you wanted and you have the financial means, a one-time tip is much appreciated. Even if this is unlikely to add up to serious money, it takes the one-sidedness out of the process of responding to user requests.
If you are extracting value from your use of my software, set up a proportional monthly donation.
The monthly part is the important thing here. Having to periodically beg a user base to please contribute to a donation drive again is a drag, and not very effective. Convincing users to donate as long as they get value from the software gives a maintainer a more or less predictable, low-maintenance source of compensations for their work.
Along with this, I will run a crowd-funding drive to launch the project. This is a way to try and get paid for the huge amount of work that went into the initial implementation (as opposed to future maintenance). It worked well with my Tern project.
In the past year or so, a lot has been written about the problem I am trying to address here. With luck, the IT community will start to become more aware of the issue. Ideally, I'd like to move towards a culture where setting up contributions to open source maintainers whose work your company depends on becomes the norm. Individual projects setting such expectations can help move us in that direction.
I also hope that services making it easier to transfer money from open source users to open source maintainers, like Bountysource and Patreon, become more widely used. There is definitely room for more experimentation in this space. For example, software and conventions to help channel contributions made to a project to individual contributors who are doing the work, in a fair way.
But these things, if they work at all, only work for proven projects with many users. To lower the financial bar for people interested in starting open source work, we need something different. One possibility would be well-funded projects channeling some of their money towards mentorship and compensation of junior contributors.
Another avenue would be to build organizations that act as incubators for open source projects, helping people with a basic income and a helpful environment as they work on open source. New projects are hit-and-miss, of course, and motivating companies to fund these things is likely to be difficult. But the money needed to provide basic financial security for a starting programmer is trivial compared to the value created by having good software freely available. You could see such organizations as doing public infrastructure work, and if sold in the right way they just might be sustainable.
(For another recent piece on the same topic, see Noah Kantrowitz's Funding FOSS.)
]]>The problem is this: CodeMirror has an internal model of the document that is being edited, and is displaying a representation of this model on the screen, using the browser's DOM. These have to be kept synchronized, for obvious reasons. However, updating the DOM is costly, especially if you are going to interleave those updates with reading layout information from it.
Most changes to CodeMirror's display require re-positioning the cursor or selection. To do this, CodeMirror reads layout information to figure out the exact placement of the relevant characters. Thus, if every function that touches CodeMirror's model of the document went ahead an immediately updated the DOM display, the result would be mind-bogglingly slow.
This is somewhat similar to the problem that is being solved by the React library. That library implements a “shadow DOM” data structure, which is cheap to modify or rebuild. Only after your program has finished fiddling around with this DOM is it compared to the actual, live DOM, which is updated, in one go, to conform to the current shadow DOM. The result is a higher-level view of a batch of display changes, which allow for more efficient updating.
The Atom editor has simply switched to React for their rendering code. (Later edit: they are switching back.) CodeMirror takes a different approach.
The central concept in this approach is that of an “operation”, which groups a bunch of actions performed on an editor together. Operations are implemented with a higher-order function that does something like this:
CodeMirror.prototype.operation = function(action) {
if (this.alreadyInOperation()) return action();
this.setUpOperation();
try { return action(); }
finally { this.finishOperation(); }
};
This means that the scope of an operation corresponds to the dynamic
extent of a function call—everything done by that function and the
other functions it calls is part of the same operation. It also means
that operations can be cheaply nested, since the operation
method
only needs to start an actual operation if we aren't already inside of
one.
All CodeMirror's public methods that change the document are implicitly wrapped in an operation, so client code only has to worry about doing its own wrapping when explicitly grouping multiple things together.
For each operation, a data structure is allocated to track the things that were done in this operation. For example, there is invalidation information about the lines of code that are visible on the screen (remember, not the whole document is rendered). If lines 10 to 40 are visible, and we make a change to line 20, that is recorded. If we make a change to line 50, that is not interesting, since that line hasn't been drawn anyway. Similarly, there are flags that indicate that other information, such as the width of the widest line in the document, needs to be recomputed.
As an example, say you type a character on line 3, which changes the
content of a line, and then that character triggers an auto-indent,
which changes the line again, and then the
closebrackets
addon's
event handler runs, adding another character to the line. Though a
whole bunch of things happened, at the end of the operation started by
the input
event handler, there is a single invalidated line, line 3,
which needs to be redrawn, once. The cursor similarly only needs to be
repositioned a single time.
To track which lines have been drawn, and whether those lines have
changed in the meantime, CodeMirror keeps a view data structure.
This data structure describes a range of the document (which, directly
after a redraw, is the currently visible range), in terms of visual
lines, not logical lines. A visual is the unit of rendering. Such a
line normally corresponds to a logical line, but when there is folded
or replaced text present (see the
markText
method),
pieces from multiple logical lines might visually end up on the same
line.
The display data structure contains references to the DOM nodes that represent the given visual lines in the current DOM, as well as data and caches related to character measurement (which has to be invalidated at the same time as the DOM nodes themselves).
Whenever the text, or the styling of the text, inside a single line is
changed, the relevant code will call the internal regLineChange
function, which checks if the line is currently rendered, and marks it
as invalidated if it is. Different kinds of invalidation are
distinguished, so that many changes (such as a class change, gutter
change, or line widget addition) can be applied without a full redraw
of the line.
When a range of lines change, possibly adding or removing lines in
between, another function (regChange
) is called, which patches up
the current view to reflect the new situation. This is somewhat
involved, due to the way our coordinate system (line numbers) is
changing under our feet when lines are added or deleted, and further
complicated by the fact that folding information might also change.
During a display update (at the end of an operation), the display is re-synced with the editor's current scroll position, cutting off parts outside of the viewport and adding new parts that were not previously covered. The display is then actually redrawn, ensuring that all visual lines in the view are actually present and up-to-date in the DOM again.
Having a single function drive all DOM updates makes it a lot easier to order these to prevent unnecessary relayouts. But, unlike most UI code, CodeMirror is dependent on the output of the browser's layout algorithm in a number of cases. As mentioned before, it uses real measurements of character position to place the cursor and selection, rather than assuming everything is left-to-right unstyled monospace text that allows us to compute positions ourselves. In addition, the faked scrollbars need to be resized and shown or hidden depending on the actual size of the document.
The display update code is organized in a rather convoluted way, caching measurements and ordering actions by their place in the update pipeline, rather than by subsystem. This way, it keeps the number of relayouts down to one or two for most operations. Unfortunately, it is very easy to break this when changing the code, and I have not yet set up an automated way to detect regressions. Chrome Dev Tools’ time line view has been an invaluable help in finding sources of unnecessary relayouts.
The operation-managing code has recently (in version 4.4) been changed to not just synchronize display updates within a single editor, but across CodeMirror instances. That means that if you have a bunch of editors on your page, for example showing different views on a bigger document, updating all of them is a lot cheaper than it used to be.
]]>For the first edition of my book (Eloquent JavaScript), I wrote my own markup format style and a Haskell program to parse it. That was, of course, a big waste of time and made it needlessly hard for people to contribute to the book or translate it.
When writing the second edition, I used AsciiDoc as a source format, since I had had good experiences with that before. This worked reasonably well, though I'll discuss some problems I had with it later.
The nice thing about being both a programmer and an author is that you can build custom tooling for the text you are working on. While rewriting Eloquent JavaScript, I built up a big suite of tools and scripts to build, check, and customize the book. This blog posts describes those tools. They are probably of no direct use to anyone else, but they might inspire similar tools, and make for an amusing technical story. The source and build tools for the second edition can all be found in the book's github repository.
The version of the book on the website is interactive. It allows the reader to edit and run the example code, and to solve the exercises inline.
To support this, I include CodeMirror (which
was originally written for this very purpose, to use in the first
edition of the book) and use it to turn code snippets into editors
when clicked. When the code is executed, this is done in a hidden
iframe
element, which has its console.log
wired up to output its
arguments below the editor, rather than writing them to the normal
JavaScript console. When running HTML examples, the iframe
is not
hidden, but instead also shown below the editor.
An interesting dilemma is the question of what happens when the reader inputs a program that creates an infinite loop. Back when the first edition was being written, browsers still aborted scripts that ran for too long. But that has gone out of style, and modern browsers tend to just crash the tab instead. This is not a very good user experience.
So I stole Remy Sharp's approach from jsbin, which is
to mangle the code that the user enters, inserting time checks at the
end of loop bodies so that scripts that jump to a previous point in
the program can be terminated when they run for too long. To do this
properly, I run the code through
Acorn to parse it, and patch up the
relevant points using information from the parse tree. The actual code
inserted increments a global variable, and only calls into the more
expensive time-checking function when that variable modulo 1000 yields
zero. When the time is exceeded and the user confirms the termination
of the script through a confirm
dialog, an exception is raised.
It is still possible to create an infinite loop with creative use of
continue
or by catching and ignoring the exception, but for typical,
accidental infinite loops, this approach works well.
In the first edition, the reader was required to run all code snippets they came across, or they would run into errors about undefined variables when trying to run code that depended on earlier definitions. This was very confusing and needlessly complicated. The build scripts for the second edition allow snippets of code to be marked as part of the base code for a chapter, and will extract such code into files that are automatically loaded when running code in the interactive environment.
My scripts can currently build this interactive HTML version, an ePub book (zipped XHTML), and two flavors of LaTeX: one in the style of my publisher, and one to build the free PDF file.
Using the same sources for both an interactive website and a static book is somewhat challenging. In the interactive version, the reader is encouraged to see the output of a program by running it. This is not possible on paper or in an ebook, so there the text has to be slightly different, and screenshots or output dumps have to occasionally be included.
This was solved by defining some AsciiDoc macros to only include
pieces of text in a given type of target. This makes the sources a bit
more clunky, like ifdef
in C does, but differences were rare enough
to keep this manageable.
Complicated formula can not be sanely described in pure AsciiDoc, so I occasionally had to include some raw LaTeX or HTML in the sources. This was done with another set of macros, and does mean that if I want to add another target format, I will have to add code specifically for that format as well.
The publisher insisted on some conventions that I don't personally find helpful, such as classical-style quotes, where punctuation after a quoted piece of text is moved into the quotes, and title case in section headings. I wrote the sources using my preferred style, and have a post-processing node script that transforms them to the other style when building the LaTeX files for the paper book.
All the invocations of asciidoc
, xelatex
, inkscape
, and the
variety of node scripts that make up the build system are orchestrated
by a makefile.
I bear no great love for make
, since at a certain level of
complexity makefiles tend to become utterly unreadable spaghetti, but
for a small project like this they work wonderfully, allowing me to
specify the dependencies between the files in a succinct and precise
way, and being able to rebuild only what had to be rebuilt when I
change something.
The website and the ebook files distributed from there are kept up to
date by a post-update
git hook that runs make html book.pdf book.epub
whenever I push something new to the repository.
When a piece of code sits in a non-runnable text file, and is being occasionally edited to evolve with the surrounding text, it is guaranteed to get damaged at some point. You'll introduce a syntax error, or refer to a variable that you renamed in the meantime. Several embarrassing bugs snuck into the code in the paper version of the first edition this way.
To prevent that from happening this time around, there is a script
that extracts code snippets from the book, assembles them into a whole
program, and tries to run them. Throughout the book, I use the
convention of including the output that console.log
produces in
comments that start with // →
. When such comments are present, the
test runner also verifies that the output produced is the expected
output.
Some code, such as the skeleton code into which people are to write
their exercise solutions, or code that intentionally illustrates a
mistake, does not run as-is. The test runner recognizes comments (in
the AsciiDoc sources, not the code snippets themselves) like // test: no
to disable testing of the next snippet, or // test: wrap
to run
it in its own scope (to prevent leakage of variables or strict
declarations).
To be able to test browser code, I am using the
jsdom
npm package. It is rather
slow, and not very robust, but it does enough to allow the simple code
in the book to run.
I also wrote a simple link checker that verifies that the targets of internal links that occur in the text actually exist, and are only defined once.
AsciiDoc has no maintained LaTeX backend. It does come with such a backend, but that is a half-working mess.
I can see why this is. LaTeX is not a nice target language. It requires all kinds of things to be escaped. But only in some contexts. In other contexts, escaping them will break things. Or they have to be escaped differently. It might be that AsciiDoc's configuration is powerful enough to do this, but neither the person who started the LaTeX backend, nor me, could figure out how.
Thanks to the fact that I only use a rather narrow subset of AsciiDoc, I could get passable output after a few days of tweaking my configuration file. Some problems I was unable to solve in a sane way, so I set up a post-AsciiDoc filter in the form of a node script that fixes these. Horrible kludges all around, but it works for my purpose.
This project thoroughly burned me out on LaTeX in general. The output looks great, way beyond what HTML is capable of, but you are pretty much condemned to learn its obscure, primitive language and write really ugly code in it if you want to customize anything for which a package does not already exist. On top of that, a LaTeX source file is everything but declarative—you are expected to mix all kinds of ad-hoc layout code into your content. CSS is not perfect, but it sure is more pleasant than this.
I had my pre-processing script convert internal links from
00_foo.html#anchor
format to simply anchor
, and ensured that all
anchors were unique in the book. This way, internal links work
seamlessly across the various document formats.
I am making heavy use of SVG pictures in this book. For some silly reason (probably because it is a hundred years old and no serious update has happened in ages) LaTeX, which compiles to vector graphics, can not use SVG vector graphics. So I had to convert those to PDF files, and patch the LaTeX output to refer to the PDF rather than the SVG files.
Additionally, there are still browsers that don't render SVG, so PNG fallback pictures were also needed.
I tried to use ImageMagick's convert
command to do the file conversions, but that makes a terrible soup of
SVG files. Fortunately, Inkscape, the
open-source SVG editor, can be invoked from the command line to
directly convert images. I integrated that into my build process
instead.
To fall back to PNG in the HTML files, I used a simple script that
detects SVG <img>
support, and if it isn't present, simply updates
the src
property of the image tags to point at the PNG files
instead.
AsciiDoc has a lot going for it—it is pleasant to read and write (as opposed to XML), powerful enough to express serious documents (as opposed to Markdown), and easy to extend (as opposed to almost everything else).
But, like all “clever” text formats, it suffers from bad corner cases. Parsing is largely done with regular expressions (which makes it very easy to add syntax), but occasionally those go wrong, as regular expressions tend to. True to the system's name, most of its regular expressions are written as if everything is ASCII. Using markup characters next to Unicode punctuation (emdash or quotes) often confuses the parser. And there were moments where syntax inexplicably stopped working when spread out over multiple lines.
A bigger problem is the poor documentation of the configuration format. It is a rather simple format, and looking at the examples that come with the distribution (the various backends are basically just configuration files) got me quite far. But when I needed to do something complicated, usually when trying to generate LaTeX, I kept running into weird behavior in its templating language, which collapses white space in strange ways, and for whose syntax I could find absolutely zero documentation.
The worst issue, though, was that I ran into bugs where some macros would corrupt some internal data structure, and pieces of text would end up being replaced by senseless binary soup.
For a next project, I will probably first see if AsciiDoctor fits the bill. This is a rewrite of AsciiDoc designed to be faster, easier to extend, and less weird. But there too I could find little docs on the configuration process, and by the time I was frustrated enough with AsciiDoc to consider switching, I had already dug myself into a rather deep hole by depending on AsciiDoc-specific features.
]]>When Angus Croll first told me that he was working on a book that solves simple programming exercises in the style of various famous literary authors, I couldn't help but suspect that such a book would be completely pointless, and he wouldn't sell ten copies. After reading the result of his efforts, I still firmly stand by the latter prediction, but I have to admit that I actually enjoyed this book.
If Hemingway Wrote JavaScript is a slim, prettily typeset book that is divided into five programming exercises, and has five authors solve each of the exercises. It accompanies each solution with a brief description of the author's background and style, and a detailed walk-through of the code.
Wild experiments are what moves a genre forward. And this is a format of programming book that's certainly never been tried before.
The exercises are very simple—think Fibonacci and factorials. But some of the solutions are absolutely charming. Here's Jorge Luis Borges finding prime numbers, transforming the problem into a story about long-legged monsters climbing a set of stairs:
// They speak (I know) of finials, newels and balustrades
// of hidden spandrels and eternally clambering, broad-gaited beasts...
var monstersAscendingAStaircase = function(numberOfSteps) {
var stairs = []; stepsUntrodden = [];
var largestGait = Math.sqrt(numberOfSteps);
// A succession of creatures mount the stairs;
// each creature's stride exceeds that of its predecessor.
for (var i = 2; i <= largestGait; i++) {
if (!stairs[i]) {
for (var j = i * i; j <= numberOfSteps; j += i) {
stairs[j] = 'stomp';
}
}
}
// Long-limbed monsters won't tread on prime-numbered stairs.
for (var i = 2; i <= numberOfSteps; i++) {
if (!stairs[i]) {
stepsUntrodden.push(i);
}
}
// Here, then, is our answer.
return stepsUntrodden;
}
You can tell that the author spent a lot of time on these snippets,
polishing the style and adding intricacies—like the fact that the
inner loop starts at i * i
in the code above. Which it can, but it
takes a second take to figure out why. If you approach these as you
would normally approach code, they are just misguided blobs of
confusion and weird variable names. But if read as stories and logic
puzzles, they are very interesting. Several of the solutions
initialled looked like they could not possibly work. Yet when I tried
them out, they all worked—except for the intentionally broken solution
at the end of the book.
Another thread that runs through the book is a critique of the limiting, constrained coding style advocated by tools like JSLint. Angus argues that coding in a uniform subset of a flexible language like JavaScript robs us of a lot of expressivity and artistic freedom.
This is a point I mostly agree with. Though the of literary JavaScript explored in this book doesn't seem to hold much promise for production code, it can take a place in our cultural background as a useful reminder of what you can do with this language.
All in all, I had a good time with this book. But then, I should, because I am the precise target audience: a JavaScript programming literature nerd with a taste for bizarre coding styles. In the unlikely event that this also describes you, give this book a try. (If there's ten of us, we might even disprove my prediction about the book's sales.)
As a bonus, here's my attempt to have Hunter S. Thompson compute prime numbers.
// It's four in the morning, and I'm desperately late with this
// goddamn assignment. I'm going to go ahead and assume you'll use
// this in your book, and I expect a fat cheque in the return mail.
function indivisibleFreaks(limit) {
var freaks = []
// Line them all up...
offTheHook: for (var number = 2; number <= limit; number++) {
// ...and drop the screwups with a .44 Magnum.
for (var aim = 0; aim < freaks.length; aim++)
if (number % freaks[aim] == 0) continue offTheHook;
freaks.push(number);
}
// What you bastards want with these poor guys, I can only imagine.
return freaks;
}
]]>== null
comparison expressions, even though linter software tends to
disapprove.
JavaScript defines both undefined
and null
as separate values. The
first comes up in many constructs in the core language—an
uninitialized variable, a parameter that wasn't passed, a missing
field in an object, or a call to a function that doesn't return
anything all have the value undefined
. The null
value has the ring
of a C null-pointer, and tends to be common in library interfaces—a
child-less DOM node has a firstChild
property of null
,
getElementById
with a non-existent ID gives you null
, and so on.
I have found the distinction between null
and undefined
to be
mostly useless—a cognitive burden without merit.
The interesting distinction is usually between 'non-values' and actual values.
The == null
(or != null
, as it may fit) pattern is...
== undefined
.typeof X == "undefined"
.undefined
and null
values, saving external code
the bother of worrying what they are passing in.=== null
._.isNull
and similar helper functions,
which I unexplicably see pop up in many projects.So I've made it policy in my code, whenever possible (which is pretty
much always) to treat null
and undefined
interchangeably when
accepting them as input. This saves both myself and the users of my
libraries one more thing to worry about.
(For output, I tend to stick to null
, so as to not force my sins on
the poor souls who have to code under the tyrannical eye of JSLint.)
Years ago, before accidentally rolling into this JavaScript career, I mostly programmed Common Lisp. Emacs integration for Common Lisp is divine. It does just about everything, from interactively exploring and modifying a running system, to debugging, to looking up documentation for you on the web. But the main two things that made day-to-day programming in that environment so wonderful were that it automatically showed me a list of argument names when I was typing arguments to a function call, and that it allowed me to jump to the definition of something at a single keystroke (and then jump back to where I came from at another keystroke). This meant that I hardly ever had to break my flow to go look up the interface for a function, or wonder how or where something is implemented.
One thing about machine intelligence is that machines don't develop it spontaneously. Someone has to put in the time to teach them. Since programming-language geekery is my thing, and having even half-decent JavaScript integration for my editor would save me a lot of time, I've started exploring this problem.
Early this year I got a prototype off the ground that did a passable job at the basic features that I wanted (argument hints, jump-to-definition, and of course decent auto-completion). Using that prototype, I duped the crowd into kindly spotting me some money to continue working on it. The result is called Tern (github), which is now an editor-independent static analysis engine that is being integrated into several different editors.
In this post, I'll try to document how Tern works.
As a first step, Tern parses your code using Acorn. If it's not currently syntactically valid because you are in the middle of editing it, it still parses it, using the error-tolerant parser in Acorn.
It then, in a first pass, builds up a representation of the scopes in
the program. If we ignore with
and some of the nasty behavior of
eval
(which Tern does), scopes in JavaScript are entirely static and
trivial to determine. This already gives us enough information to
auto-complete variable names, and jump to the definition of variables.
But we also want to complete property names, show argument hints, and jump to the definition of functions and types that are, for example, stored in a property.
For this, we need to figure out the types of the program's elements. This is the central problem that Tern tries to solve. In a language like JavaScript, it is a rather challenging problem.
The type inference in Tern is done by a second pass (after the one that builds up the scopes) over the code. You can picture this pass as building up a graph that represents the way types flow through the program.
Each variable and object property, as well as some kinds of
expressions, will have an abstract value associated with them. This is
a set of types that have been observed for this element. These are the
nodes in the graph. The edges consist of forwarding information. For
example, if the expression y = x
is found in the problem, the
abstract value for variable x
will be set to propagate all types it
receives to the abstract value of y
. Thus, if x
is somehow known
to be a string, y
will also get type string (but might, depending on
how it is used, receive more types than just that).
The graph may be initialized from one or more definition files,
which are a simple JSON data format that contain information about
global variables and their types. Such files for the basic JavaScript
environment and the interface exposed by browsers are included in the
Tern distribution. These tell Tern that, for example, the parseFloat
global holds a function type of one string argument that returns a
number.
Let's see what the graph would look like for this pointless program.
var x = Math.E;
var y = x;
var z;
x = "hello";
You see blue circles that represent the abstract values for our three
variables and for the Math.E
property. The orange boxes are actual
types. The basic ECMAScript definition file has given Math.E
the
type number. Since it is then assigned to x
, an edge is added from
Math.E
to x
, which causes x
to also get the number type. y
is
initialized to x
so it gets all of x
's types. When x
is assigned
a string value, the string type flows into it and, consequently, also
into y
. The variables now have both the string and number type
associated with them.
Variable z
is never written to or read, so it just sits there sadly
in its corner, without any connection to the rest of the program, and
its abstract value remains empty.
Note that in actually executing this program, x
would end up with a
string and y
would still be simply a number. Tern ignores control
flow and pretends everything in the program happens, basically, at the
same point in time. This is an approximation that's not correct, but
that makes the graphs a lot easier and cheaper to construct. For
typical programs, it doesn't have much of an adverse effect on the
quality of the inference.
Propagation of types is not always direct. In many cases, the algorithm will assign a specific propagation strategy, which may contain arbitrary code, to handle types that appear in a source abstract value.
The most common case of this is function calls. For every function call in the program, a propagation strategy object is created that knows the arguments that were passed to the function, and has an abstract value that represents the result type of the call. The type of the callee is set to propagate to this object. When it receives a function type, it will set the argument types of the call to propagate to the argument variables of the function type, and set the return type of the function type to propagate to the call's result type.
This means that function types must hold references to the abstract
values for their argument variables, and for an abstract value that
represents their return type. (And, in fact, another one that
represents their this
type, though we'll ignore that in the next
example.)
function foo(x, y) { return (x + y); }
function bar(a, b) { return foo(b, a); }
var quux = bar("goodbye", "hello");
You can see the function types, as orange boxes, containing
(references to) abstract values. Function declarations will cause such
types to be created, and added to the variable that names the
function. The purple boxes are propagation strategies. There are two
calls in the program, corresponding to the two purple call boxes. At
the top is a simple box that handles the +
operator. If a string is
propagated to it, it'll output a string type, and if two number types
are received, it'll output a number type.
The arrows going into the arguments and coming out of the result type of the call propagators are added as soon as the call is seen. The arrows coming out of the arguments and going into the results are added when an actual function type is found for the callee.
You can follow the string types that are passed to bar
through this
graph, being joined together by the +
operator into a single string,
and then flowing back down through the return types and into the
quux
variable.
In this program, every value was used in a single way, causing almost all the elements to neatly have a single ingoing and a single outgoing edge. This is not typical for real code. Every assignment to a variable will add another incoming edge, and every time it is read, another outgoing edge is added. In a big program, variables that are referred to a lot will have hundreds of outgoing edges. It very quickly becomes impossible to visualize the type graphs for real programs in two dimensions. (I already had to go through three throw-away sketches before I found a layout for the above graph that was not a tangled mess of crossing arrows.)
Tern defines a number of other propagation strategies, handling things like property reads and writes, and creating instances from constructors. These work in ways analogous to the call strategy described above.
An important feature of these type graphs is that no matter in which order they are built, the final result will be the same. If you call a function that is only defined later in the program (or in a file that is defined later on), your call node will sit there patiently waiting until a type is propagated to it. And when that finally happens, it'll set up the necessary edges in exactly the same way that it would have done if the function had been seen before the call.
One important property of this inference algorithm is that it will only give types to things when the types are actually observed being assigned or passed to it. This is fine, most of the time, but can be a problem for functions that are never called—no types will flow into the function's arguments.
Why would you write a function and never call it? Usually because you are writing a library.
Tern doesn't fundamentally solve this problem, but it uses a trick to
be able to show at least something, even for abstract values that
don't have any definite value. By looking at the propagations made
from an abstract value, it is often possible to get some clue on the
way it is used. If, for example, it is propagated to a variable that
usually holds numbers, it might just be a number. If its charCodeAt
property was accessed and the only type we know of with a charCodeAt
property is the built-in string type, it's probably supposed to be a
string.
Guessed types are not added to the actual type graph. They are only returned when directly asking for the type of an abstract value that has no known type. This means that if you have a function like this...
function gurble(frobs) {
return frobs.reduceRight(function() { /* ... */ }, 0);
}
... Tern will consider the frobs
argument to be an array when
displaying argument hints, but won't actually propagate anything to
the return type, because it isn't sure (and once a type is propagated
into the graph, there's no way to un-propagate it).
Here's a pattern that's relatively common in JavaScript code:
function extend(proto, props) {
function Ctor() {}
Ctor.prototype = proto;
var obj = new Ctor();
if (props) for (var prop in props) obj[prop] = props[prop];
return obj;
}
This creates a new object that extends another object with some new properties.
The type of the resulting object is not 'visible' in the source code. It is dynamically created at run-time. Yet, if a piece of code uses this pattern to set up its types, Tern will completely fail to understand it unless it can properly interpret what the code is doing.
And now we're off into dodgy hack land. In order to meet this
challenge, Tern uses a special treatment for for
/in
loops that
appear to be copying properties. When it encounters them, it assumes
that the properties from the source object will be copied to the
target object. It ignores control flow (conditionals and such) and
simply copies all properties.
That solves half the problem. Now if you call extend
once, it will
create a new object type at the new
expression, copy in the
properties from props
, and return it, resulting in more or less the
type you'd expect.
But, if you create multiple types using extend
, you'll run into
another problem. All of them will have the return type of extend
flow into them, and the return type of extend
will be all of them (a
function has a single abstract value representing its return type).
Thus, you'll end up with a useless mess that will have little to do
with your intended object types.
To fix that, Tern uses a heuristic to determine whether a function is a type manipulating function, and if it decides that it is, its return type will be computed in a different way. Instead of building a type graph for the function's content once, and connecting that to all argument and return types, the function is 'reinterpreted' for every call site, creating propagations only for the current arguments and return type.
(Conceptually, all functions that aren't recursive could be interpreted this way. It would probably produce superior results, but also be much, much more expensive.)
The heuristic to determine whether a function manipulates types or not
is rather superficial, and will only catch the pattern in the example
above and a few similar things. Every time a function assigns to a
prototype
property, instantiates a locally defined constructor, or
copies properties in a for
/in
loop, a score is incremented. When
this score, divided by the size of the function, crosses some
arbitrary threshold, the function is marked as a type manipulator.
The previous section already mentions how Tern's model of representing the return type of a function with a single abstract value can lead to bad results. A similar issue is seen with functions like this:
function randomElement(arr) {
return arr[Math.floor(Math.random() * arr.length)];
}
If you first pass it an array of numbers, and then an array of booleans, the number and boolean types will both be flowing into the result of any call site of the function.
To gracefully deal with functions like this, Tern tries to determine,
after it analyzed a function, whether it is a generic function that
has a return type that directly depends on its input type. To do this,
it again makes use of the type graph, by performing a graph search
trying to find a path from one of the arguments to the return type. It
doesn't search very deep, since that'd quickly get expensive, but for
example the randomElement
function above has a simple path from
arr
, through a propagation object that propagates the element type
of the array, to the return type.
For calls to randomElement
, instead of propagating from the
function's return type abstract value, we simply take the element type
of the type (or types) of the first argument, and use that. This'll
correctly type randomElement([true])
as a boolean, and
randomElement([1])
as a number.
The inference engine makes up most of complexity of the project. But above it is another layer, the layer that most client code talks to, which is not entirely trivial. In order to support an editor, we must also be able to maintain an up-to-date view on a code base as it is being edited.
This is where the server component comes in. It keeps track of a set of files, analyzes them, and answers queries about the code. Here's an example of a request to a Tern server (requests are JSON documents):
{
"query": {
"type": "completions",
"file": "myfile.js",
"end": 20
},
"files": [
{
"type": "full",
"name": "myfile.js",
"text": "var foo = document.f"
}
]
}
It uploads a small file to the server, and asks for completions at character position 20. The server might respond with:
{
"completions": ["firstChild", "forms"],
"start": 19,
"end": 20
}
Your editor could then use this information to provide actual auto-completion.
The query
/files
format for request is intended to make it easy for
editor plugins to update the server's view of the code as they are
making requests. The server will save the files, so that if multiple
requests are made without the document changing, there's no need to
re-upload the code again and again.
There is also support for sending only a fragment of a file, providing some context for the request but requiring this fragment to be analyzed in the context of a previously built up type graph and scope tree for the whole file. This is useful when dealing with big files—analyzing 6000 lines of code can take up to 300 milliseconds, which is too much for an interactive interface. If the 6000 lines were analyzed in advance, in the background, and when auto-completion is triggered, only the 100 or so lines around the cursor are sent to be re-parsed and re-analyzed, the server can usually respond in under 30 milliseconds.
Another responsibility of the server is managing plugins that
influence the way files are loaded. There are currently plugins for
RequireJS and node.ns, which will automatically try
to load dependencies, so that modules loaded through define
or
require
will have their types understood by Tern.
In closing, I want to give a quick overview of related work. This list is incomplete. There are tools (such as VJET for Eclipse) on which I couldn't find any detailed technical information, and there are likely others that I am not even aware of. You are encouraged to write me if you have corrections or additions.
Since VS 11, 'Intellisense' for JavaScript is nothing short of amazing. It works, as far as I understand it, by actually running your code in a magic invisible way, instrumenting the Chakra engine to ignore I/O and cut off loops that run too long or recursion that goes too deep, and then inspecting the resulting JavaScript environment to find out what actual types were created for a given variable or expression. This makes it amazingly accurate, even when you're doing very odd things with your types. Downside is that it'll sometimes not be able to run the code that you need a completion for with its actual input types (it needs to find a code path leading to that code, which can be tricky), and thus fail to provide completions.
If you're curious about this technique, there's a very watchable video of a presentation by Jonathan Carter on this subject.
The Scriped editor is a relatively new code editor with a focus on editing JavaScript. It comes bundled with a plugin that parses your code and runs a type inference algorithm on it in order to provide completions. This is similar to the approach taken by Tern, except that its inference algorithm is very different. From a quick reading of the code, it appears to be mostly bottom-up, with a second pass that does some propagation. See also this blog post.
jsctags
is a tool that generates Ctags files
from JavaScript code. doctorjs
is a fork of jsctags
which
uses a more advanced form of abstract interpretation in its inference
algorithm.
Both projects have been more or less abandoned, and don't yield very
good results. They did influence (mostly through Patrick Walton's
doctorjsmm
experiment) the design of Tern a lot. But I've
always felt that the static, non-interactive way Ctags (or Etags)
work, which undoubtedly made sense back when they were invented, is
awkward and not really appropriate anymore. Tern uses a 'live' data
source—a process that updates it analysis as you change the
code—instead.
The inference algorithm that doctorjsmm
tries to implement is based
on the type inference subsystem of the SpiderMonkey JavaScript
engine. Tern's algorithm is also closely based on this technique
(outlined in this paper).
However, a JavaScript compiler and an editing assistance plugin have quite different goals, and thus also quite different requirements for their type inference algorithms. SpiderMonkey's compilation process has to be lightning fast, and Tern can take (a little) more time if that happens to improve results. SpiderMonkey has to be conservative, since producing incorrect information to the compiler might result in segmentation faults or exploitable bugs in the compiled code. Tern can more freely use approximations, since getting an inaccurate auto-completion doesn't usually cause disaster. And finally, SpiderMonkey's type inference can use feedback from the actual running code. Tern doesn't run the code, so it doesn't have that luxury.
]]>This is a doctrine that I've come to esteem highly: write code that solves the current problem you have, and not a bunch of other, similar problems that you can imagine someone may have in the future.
And such a system will have to grow, almost without exception, as new use cases come up. But I argue that a non-minimal system, no matter how much time was spent on a genius architecture up front, will also have to change to deal with new realities. I haven't yet met an engineer who was able to accurately predict future uses of her systems. I certainly can't. Such a more abstract system would have more code, and thus more inertia—it takes more work to pull it into a different direction.
All code is, in principle, throw-away code. I might not actually throw it away, but I am prepared to, and fully expecting to, change it in radical, major ways after I write it. Thus, rather than writing code in a way that makes it flexible enough to adjust to future circumstances, I focus on keeping code small and simple enough to extend it to future circumstances without much effort.
Of course, as use cases accumulate, systems do get bigger, and abstractions are built up. But these, all being responses to actual real-world situations, are adding obvious value to the software. And, if the use case they address is found to be misguided, or they turn out to not address it very well, they are pitilessly scrapped and replaced by improved approaches.
So, CodeMirror was set up as a system with zero unused abstractions. A potentially surprising design decision in the CodeMirror API, motivated by this principle, is that documents were not separate from editor instances. For the initial textarea-like use case, this was not needed. An editor had a single document associated with it, and though it internally had a specific representation for this document, the only way the outside world could access it was as simple string data.
On the bright side, this means, there wasn't any nonsense like
editor.getView().getDocument().getValue()
. You'd simply say
editor.getValue()
instead. And that way will remain
supported—interfaces conceived as the simplest thing that solves the
problem tend to be wonderfully straightforward and direct. Adding
features to a system by non-invasively working them into an existing,
simple interface tends to produce better interfaces than directly
exposing an internal model that is more complicated than the typical
use case, forcing users to deal with the indirection even when they
don't need it.
Recently, CodeMirror is coming to the point where it is quite obviously no longer just a replacement for a textarea. Most users do use it as such, and it is a design goal to remain frictionlessly useable in that way. But projects like Light Table and Brackets are full featured code editors, pushing into the same space as traditional desktop editors.
And such editors can do things like display multiple views on the same document. As in, really the same document, not a copy of the document that's being kept in sync with some event listeners and some duct tape. For example, the views should be able to share a single undo history.
Another use case that both Light Table and Brackets pushed was being able to create a subview on a document—show a piece of a bigger document (say twenty lines out of a thousand line document) in an editor, keeping them strictly synchronized with the corresponding range in the parent document.
I mentioned before that, internally, there was already the concept of a pretty well-separated document data structure. But, in order to make an interface public, simply making it accessible is rarely the whole story. You are also, if the interface is any good at all, saying that the concepts exposed can be recombined in every useful way that the user can come up with.
So whereas CodeMirror internally had the invariant that a document and an editor were married together till death do them part, a public document / editor interface would have to cater to a much bigger range of use cases—putting a new document into an editor, sharing documents between editors (the motivating use case), and all the consistency-maintenance issues that come with those.
But, on the flip side, doing the most general thing possible is also
not optimal. Specifically, if the interface is so general that it also
allows nonsensical situations to come up, it'll force us to write code
to handle these situations. As a concrete example, if editors and
documents are separate, should we allow editors without documents to
be created? My answer is no, we most certainly should not. Such an
editor would be in a thoroughly exceptional, yet more or less useless
state. A whole bunch of assumptions about CodeMirror instances (that
you can call getValue()
on them, for example) would not hold for
such editors. Or, we'd have to write special-cased code to somehow
make them hold (if (nodoc) return ""
). That would bloat the library,
introduce lots of interesting new potential bugs, and generally waste
everybody's time.
Thus, the trick is to move the interface towards something that's flexible enough, but not too flexible. And also to stay backwards compatible with the existing, straightforward API.
I went back and forth a few times, started on a few implementations that I had to back-track from, but feel I did find something satisfactory in the end.
One initial idea that I gave up is that documents and views should be different concepts. That sounds obvious, doesn't it? A view would have a scrolling position, a cursor position, and a document. A document would just be text. Separating responsibilities, and all that.
But merging the two allows us to establish some invariants, such that a document always has a cursor position associated with it, and invariants, when they don't get in the way, are good for software. It also cuts down a whole layer of cruft in the interface that users have to deal with. And the only cost is that, for the rare case where you don't need a selection or scroll position to be tracked for a specific document, there'll be a few unused object allocated. Objects that make up less than a percent of the memory footprint of even a small-sized document.
But, you may protest, if the cursor position is associated with the document, how are you going to have multiple view on a single document? Good question. The answer involves another non-intuitive design decision. There are no multiple views on a single document. Instead, there are 'linked' documents—when documents are linked, they propagate changes made to them to each other, in a way that (barring bugs in the code or data corruption) ensures they stay 100% synchronized.
Having two document representations for what is essentially a single document sounds sort of wasteful. But if you refer back to the entry on CodeMirror's document representation, you'll see that this is a representation designed around the need to index lines by vertical offset. And those vertical offsets depend on a line's visual height. And there is no guarantee that multiple views will render lines in exactly the same way. Thus, this data structure will have to be duplicated for each view anyway.
Having a canonical central data structure, when each view needs its own height index, is mostly an extra waste of memory. Since these height indices can share the actual string values that represent the document's content, their memory overhead is reasonable.
Earlier, I mentioned shared undo histories in the context of split-view / shared-document situations. In a classical split view, you see two copies of the same document, side by side, and as you make a change on one side, it appears on the other side. And when you then move your focus to the other view and toggle undo, the common expectation is that that will undo the last change you made, in any view, rather than the last change you made in the view where you issued the undo command.
Thus, if the two views are showing documents that are linked together, those documents should share a single undo history.
Now consider the situation where we create a subview of a large document, for example, a user issues a command 'show-definition-of', and we pop up a mini editor at the bottom of the screen that shows the definition of the thing the cursor was over. This mini editor is a subview into another open document, which we may have edited before, or which we may still edit as the mini editor is being shown.
Now, if you trigger an undo in this mini view, after a change has been made in another view on that document, at a point that's not visible in the mini view, what should happen?
Most people agreed that silently undoing something in far-away text
that isn't part of the focused view is usually not a good thing.
Several heuristics were proposed for kinda sorta doing doing the right
thing, but I judged them all too unpredictable and random. Instead,
when you create a linked document by calling the linkedDoc
method
on a document, you can specify whether the histories of the existing
document and the newly created one should be shared or not.
Linked documents without a shared history are slightly tricky. Changes will propagate from A to B, but without being added to B's undo history. An undo history represents change sets that can be applied to the current document to go back in history (and, potentially, redo change sets to go forward). But when the current state of the document isn't in sync with the top change set in the history, such a change set can not be cleanly applied. An analogy with git, or other revision control systems, applies well here. We have a set of patches, and we want to be absolutely sure that they can actually be applied to our document without conflicts.
Thus, to borrow more terminology from git, when a change comes in from a document that doesn't share a history with us, we 'rebase' our existing history. This rebasing, in CodeMirror, is a rather simple and destructive process that simply updates these patches, when they don't conflict, to account for changes in line numbers, and when they do conflict, discards them. Thus, if document A and B are linked without a shared history, and I edit line 10 in A, and then edit line 10 in B, my undo event in A will be lost, since it conflicted with a more recent edit in B.
Because document links are (exclusively) created by 'deriving' a new document from an existing one, the relations between a set of linked documents form a tree (and are stored as such). This means that there are no cyclic links possible, and traversing such a tree, for example to propagate a change, is easy—just walk the graph, recursing into all neighbors except for the one we came from.
It also means that sets of documents with a shared history form 'islands' in the tree. Documents that share history with no one can be seen as single-document islands. By storing the shared-history flag in the edges of the graph, it is very easy, when traversing it, to notice when we are entering another island. This is used by the code to, for example, know when it should rebase histories when propagating a change.
The linked-document model allows subviews to be modeled in a straightforward way. When creating a linked document, you can pass a range of lines in order have the new document only contain a slice of the original document.
I opted to make line numbers 'cross-document'—meaning that if you create a subview that contains lines 100 to 120 of some larger document, the first line of the sub-document will have number 100, not zero. This removes the invariant that the first line in an editor is zero, which required some adjustments, but also means that, as a change propagates between editors, it stays inside the same 'coordinate system'.
Of course, changes still have to be clipped when they propagate from a document to a subview of that document. And unfortunately, how to clip them, when the change overlaps with the boundaries of the subview, is an underconstrained problem. There are multiple credible solutions. Underconstrained problems are the worst kind of problems, because usually, none of the possible solutions are perfect.
Say, in the example subview that holds lines 100 to 120 of its parent document, that someone selects lines 0 to 110 in this parent document, and then pastes a twenty-line replacement. Obviously, the first ten lines of the subview must be removed, but as for the question whether the replacement text should end up in full, in part, or not at all in the subview, there is no correct answer. One outcome could be that the subview is left only ten lines big (20 to 30, containing the range that used to be 110 to 120, pre-change), another could be that it includes the pasted text (spanning line 0 to 30).
I ended up going with the first solution (not including the inserted text in the subview), on the intuition that a change that starts or ends outside of a narrowed subview probably has no business in that subview. Fortunately, changes like this, that both delete and add multiple lines, are relatively rare, so people won't run into my arbitrary decision very often.
A lot of methods that used to exist on editor instances, basically everything that involves document inspection, document modification, or selection, are now defined on document instances instead. For backwards-compatibility, and in order to keep the interface easy to use, I'd like to continue supporting calls of these methods directly on the editor instance.
So now there's a loop in the CodeMirror source that goes over a long list of method names, and for each of them, adds a method to the CodeMirror prototype that forwards the call to the editor's current document.
The changes outlined above landed on CodeMirror's master branch right after yesterday's 3.01 release. The updated manual describes them, specifically in the section on document management.
]]>Thus, the kind of software that usually passes for a parser is not much help. A typical parser understands it as its responsibility to diligently check its input for validity, and complain, rather than return a useful result, when a problem is found.
That's not what we want. We need access to an abstract syntax tree, not an error message.
One solution that comes to mind is a technique often used by some compilers: when they come across a syntax error, they'll report it, skip forward to a heuristically determined 'safe point' (often the start of the next statement, function, or block) and continue parsing.
However, skipping forward will drop the current context on the floor. And since the invalid part of the code is very likely to be the part that the cursor is currently at, the constructs used near this position are of great interest. Throwing them out more or less defeats our purpose.
At first, I believed I could cleverly reuse my existing parser by using the following approach:
That didn't work out. To make any kind of informed judgement on where to move braces or how big a range to blank out, you need a lot of context information. Thus, my 'light-weight error fixer' was quickly growing into a heavy-weight monster, and still pretty ineffective at, you know, actually getting a file to parse.
In retrospect, I'm not sure why I ever thought this was a good idea. I'm just documenting it here to dissuade other people from going down a similar route.
What worked a lot better, and was actually not that much work in my case (where the target language is JavaScript, which isn't a very complicated language), was to just write a new parser.
I reused the tokenizer from Acorn, my JavaScript parser. I also reused the general structure of the parser, by working from a copy and then editing that, which saved a lot of work, and, by allowing me to work from a well-tested algorithm, helped avoid a lot of mistakes.
The new parser is guided by two principles:
Some of the changes made to the parser were mechanical—simply kill all
code that verifies non-essential properties of the code, for example
whether a labeled break
corresponds to a known label. Others added
local heuristics, for example when no while
keyword is found after a
do
body, simply invent a bogus expression to take the role of the
condition.
The original parser uses an operator called expect
to enforce that
it wants to see a certain kind of token, and raise an error otherwise.
The loose parser uses a similar operator, which, if the token isn't
there, looks at the next two tokens, and skips forward if one of them
matches. If none match, it'll just return, not consuming any token—and
the parse continues as if the token was there.
Using such a careless style of parsing gets you surprisingly far. But it still leaves open the problem of a missing or superfluous brace leading to a wildly incorrect interpretation of everything after it. To get around that, the loose parser relies heavily on indentation to guide the way it parses blocks. Basically, when a new statement (or new object property, or new array element) is indented less than the first one in the block, it assumes that the block (or object literal, or array) ends there.
That works very well for properly indented code. But it will go wrong when indentation is sloppy, or people do cute things like not indenting their debug statements. This is why the loose parser should be used as a backup for when the regular parser fails, in order to get at least some kind of syntax tree, but never as a primary parser.
Finally, a subtle problem of this parsing strategy—when in doubt,
don't advance the token stream—is that is very easy to get into
infinite loops. For example, say you're trying to parse an argument
list, and there's a bunch of nonsense in the middle such as foo(a, ], c)
. The argument list parser obviously has a loop in it, calling
parseExpression
for each argument. When it runs into the ]
,
parseExpression
returns a dummy node, because it couldn't find any
expression there. The argument list parser then optionally skips a
comma (it parses foo(a c)
as foo(a, c)
—tolerant as it is), and
continues. If that was all there was to it, the above expression would
land us in an infinite loop.
To work around that, a few select parts of the parser have special cased code to ensure that the token stream does, somehow, advance on every iteration of their loops. For the argument list parser, it will check that the expression it parsed isn't a dummy placeholder, and if it is, discard a token and ignore the dummy expression. This, combined with the fact that it'll bail out when it finds a token indented less than or equal to the line that started the list, makes it return more or less sane results for most inputs.
This new parser is a submodule of my Acorn parser, and lives in
the acorn_loose.js
file. If you've come up with a use for such
a parser, or just want to see if you can break it, check out the git
repository, and have fun. It runs both in Node and in the browser.
It's not terribly mature or stable, but it does what it claims to do.
There have been several major improvements, including support for notifications and bulk copying, so a release was in order, if only to prevent the impression that the library was not being maintained.
So there it is, Postmodern version 1.19. Get it from the project page.
]]>First, some background. Why would we want to fake a scrollbar to begin with?
In order to remain responsive when huge documents are loaded in, CodeMirror does not render the whole document, but only the part of it that is currently scrolled into view. This means that the amount of DOM nodes it creates is limited by the size of the viewport, and the browser relayouts triggered by changes to the text are relatively cheap.
So if you have a document that is higher than the vertical height of
your editor, the actual DOM inside the editor will be a big, largely
empty div
that defines the height (and thus the scrollable area) of
the editor content, with inside of that, absolutely positioned to
cover the part that's currently scrolled into view, a smaller element
that contains the lines of text that you are looking at (and a slight
margin of a few lines around that).
When you scroll, you do not want to see a bunch of empty space coming into view, so the editor has to make sure that, once you're scrolling past the viewport margin, it updates the viewport and renders the content that came into view.
Unfortunately, browsers fire scroll
events after doing the actual
visual scrolling. This means that if you quickly drag the scrollbar,
you'll see empty space coming into view before the scroll
event
handler even has a chance to update the viewport. And that is the
reason for our first lie.
If the user's scrolling happens on an element that is not our actual
content container, the scrolling won't be able to move empty space
into view, but will still fire scroll
events (on the other element)
that we can use to update our viewport and then scroll our content
container.
So, we create a second scrollable element, absolutely position it
where the scrollbar is (hiding the real scrollbar), and put an element
inside of it with its height
style set to the same height as the
actual content. Scroll event handlers keep the scrolling position of
the two elements in sync, and fast scrolling with the scrollbar will
go through the dummy element and thus not cause empty space to become
visible.
There's some extra machinery needed, for example to make sure the scrollbar is hidden when the main content is not scrollable, but on the whole this wasn't hard to get right.
Enter OS X Lion.
Now a widely used platform has scrollbars that hide themselves when inactive, and are transparent. You can imagine how overlaying a transparent scrollbar over another one does not create a very believable effect. The 'hidden' scrollbar is still visible, and as you scroll it slightly lags behind the non-hidden one, looking flaky.
Which brings us to our next coverup. CodeMirror currently gives its
outer element overflow: hidden
, and its scrolling element (which
lives inside of that) a margin of -30px
and padding of 30px
on the
right side and bottom. This will, in effect, clip off its outer right
and bottom edge, without affecting its inner size. Thus, the
scrollbars are now truly hidden.
If you are thinking I could have hidden the scrollbars by setting
overflow: hidden
on the 'scrolling' element and faking all
scrolling, you're right, and we did go down that road at some point,
but see the section on wheel scrolling below—we want the mouse wheel
to scroll this element, and it doesn't do that for overflow: hidden
nodes.
This trick forces us to also fake the horizontal scrollbar (it wouldn't look right if its rightmost corner was clipped off), but that follows the same principles as faking the vertical one, and is easy to do.
Making scrollbars of different heights correspond to each other is pretty easy.
fakeScrollbar.firstChild.style.height =
(scroller.scrollHeight - scroller.clientHeight + fakeScrollbar.clientHeight) + "px";
The firstChild
of the fake scrollbar is the element used to force
its height. scroller.scrollHeight - scroller.clientHeight
is the
maximum scrollTop
of the scrolling container. We want the fake
scrollbar to have the same maximum scrollTop
(so that one pixel
scrolled there corresponds to one pixel scrolled in the content), so
we set its scrollable height to this maximum scrollTop
plus its own
outer height.
At one point, I was doing the computation of whether horizontal and
vertical scrollbars were needed, and how high/wide they should be,
myself. But that turned out to be much more error prone than simply
inspecting the scrollable container and using its dimensions
scrollHeight
/scrollWidth
, which outsources the computation to the
browser. Here it turned out that leaving the scrollable element as
overflow: auto
has another advantage—since it will have scrollbars
when scrollable, even though those scrollbars are hidden, the formula
above does not need to take the height of the horizontal scrollbar
into account—if one is needed, the browser will already be showing it
on the scrollable element, and adjust its clientHeight
as needed.
I initially believed that wheel scrolling would not be a problem, since it proceeds in small steps, and thus will never jump over the margin of the visible viewport.
Seems I'm not keeping up with technology.
The old clickety-click style of mouse wheel is only one source of wheel events. Though no actual wheel is involved, scrolling by touchpad (or touchscreen) fires the same kind of events, and has the same kind of direct, scrollbar-less scrolling effect.
And such interfaces tend to support 'throw scrolling', where if you scroll quickly and then take your finger(s) off the device, it'll keep scrolling for a bit, potentially very fast. Potentially revealing empty space again.
So we'll have to do something about wheel events as well.
The first, obvious approach was to simply handle the wheel events,
which can be preventDefault
-ed to cancel their scrolling effect, do
the scrolling ourselves, and make sure we update the viewport in the
process.
Don't do that.
Firstly, mousewheel
(and Firefox's DOMMouseScroll
) events come
with a wheelDelta
or detail
property that indicates the amount
scrolled, but there is no clear relationship between this quantity and
the amount of pixels that the browser would scroll for this event.
The delta-to-pixel ratios are wildly different across the various browsers. And not just that. They also vary between versions of browsers, and between specific versions of a single browser run of different operating systems. Possibly (though I didn't reliably verify that) even between browsers run on different hardware or device drivers.
Thus, to know how many pixels you should scroll for a given wheel
event, you'll need a long list of fragile checks against
navigator.userAgent
strings. And, since there is no standard or
default value, you'll still be defenseless against new versions of
browsers, old versions that you didn't test, or obscure browsers that
you've never heard of.
And even if you somehow got that right, your scrolling will still not feel native. Firefox, for example, will smooth-scroll your document as you scroll with the wheel, but will not fire a wheel event for every change in the scrolling position. Some browsers appear to use subtle speedup and smoothing effects in the process of a throw scroll that are not reflected in the deltas they attach to the events. Thus, wheel scrolling that we do ourselves is necessarily less smooth than the native version.
... I know, there are a bunch of projects out there which do their own wheel scrolling, and several libraries that claim to 'normalize' wheel events. But these all seem to set the bar for what 'normalizing' means pretty low, since it was trivial to get obviously inconsistent results from them when testing several browsers and platforms.
(It would certainly be nice if browsers exposed an API powerful and standardized enough to simulate wheel scrolling in a solid way. It's just that they don't.)
So CodeMirror does not override wheel events itself. What then does it do?
It does several things. First, it contains a number of crude, hard-coded, browser-detected constants to use as a first approximation of wheel-delta-to-pixels rates.
Then, it listens to wheel events, but never calls preventDefault
on
them or does scrolling in response to them. Instead, it responds by
setting a timeout to observe the amount of pixels that the wheel event
did scroll the content, and uses that to tweak its delta-to-pixel rate
at run-time.
In addition, it uses its current estimate of this rate to extend the viewport to cover the area it expects the wheel event to scroll into view. If it's wrong, some flickering (visible empty space immediately replaced by content again) might be noticeable, but it does not screw up the scrolling experience by hijacking the wheel events.
]]>contentEditable
), and
which was thus built into version 2 from the start, is
programmatically styling stretches of text. In version 2, you can call
instance.markText(from, to, className)
, and it'll style that stretch
of text with the given CSS class.
By version 2.16, it was actually possible to edit the text around and inside such a marked range without strange corruptions occurring. That version also added a way to query the marked range for its current position (if any) within the document.
Then, last month in version 2.34, marked ranges were integrated with the undo history, so that if you delete a stretch of text that contains marked ranges, undoing the deletion will bring back the ranges, not just the text.
And last week, prompted by use cases from two different customers, I
decided to add a number of rather radical extensions to this API. In
the current code in the v3
(future version 3) branch of the
CodeMirror repository, it is possible to...
The third one (collapsing) was the tricky part. If a range, which may span multiple lines, can be collapsed, that means that line boundaries are no longer as absolute as they used to be. If a range spanning the end of line one and beginning of line two is collapsed, content from line and line two ends up being rendered on what is visually the same line.
This broke a lot of assumptions in the existing code, which required me to completely rewrite a few pieces of the editor (selection drawing, character position measurement) and required subtle changes to many others. The fallout from these extensive changes are probably going to delay the release of version 3 for another month.
But it's worth that!
For one thing, I could simply drop the old line-folding (hiding) system and all its interaction with the rest of the system, since hiding whole lines is simply a special case of hiding arbitrary stretches of text. I've rewritten the code folding add-on to use the new APIs to hide precisely the folded range, and replace it with a little widget that can be clicked to un-fold the range.
The use cases that prompted these extensions are actually quite similar to something that some Emacs packages do: text as a user interface. If you make some text read-only, insert widgets where appropriate, and use styling to make it clear which text is editable, you can provide a rather smooth interface for things like forms, interactive prompts, or even, if you're willing to stretch it, games.
And there you have it. My secret ambition is to replace Emacs, at least for my own use. The concept of writing shells for everything I do in my text editor appeals to me. But the Emacs way of doing that is, unfortunately, firmly grounded in the 1980s (or 70s), and shows few signs of moving into the 21st century. The APIs are just to obscure, the language too slow, and the ecosystem too weird for me.
(A fair number of other parts will have to be put in place for CodeMirror to be viable as a day-to-day editor, but I'll talk about those in some other entry.)
Obviously, this castle-in-the-sky plan is not the only use case for
the marked range enhancements. As a more concrete example, I hope to
soon write an add-on on top of them that allows you to replace
stretches of text matching a specific pattern with a widget, and then
expand them back to regular text as the user moves the cursor into
them. For example, in a LaTeX document, you could replace sequences
like \epsilon
with an actual ε
character, and but when you copy a
chunk of text containing that range, you'd still get the original
\epsilon
text, and if you move the cursor into it, it expands into
that text and you can edit it at will. So the presentation of the
document can be enhanced, without actually harming its consistency
as a simple editable plain-text document.
Another one.
Just like:
Acorn produces a well-documented, widely used AST format. The same as the last two parsers in that list.
Acorn is really fast. Just like the last one in the list: Esprima.
Acorn is tiny. About half as big as Esprima, in lines of code.
Still, there's no good reason for Acorn to exist. Esprima is an excellent project, well-documented, and small enough for any practical use. It exposes an interface very similar to Acorn.
The only reason I wrote Acorn is that small, well-defined systems are
so much fun to work with, and that Esprima's web page very
triumphantly declared it was faster than parse_js
, the
implementation in UglifyJS version 1, which is a port of my own
parse-js Common Lisp library.
I just had to see if I could do better.
Turns out I can. Acorn beats Esprima, at least on Chrome, Firefox, and Opera (I didn't test other browsers) by a narrow margin—in the 5-20% range—when not storing source location data, and by a wide one—about five times faster—when storing source location data. See here for a reference. That second number is mostly due to the very unoptimized way in which Esprima manages the flow of this data—it could probably easily do better.
To even get to the point where my parser had this small speed
advantage over Esprima, I had to steal some of its tricks. Most
notably, in order to test whether a string is part of a set (of
keyword, reserved words, etcetera), Esprima uses hand-rolled
predicates that use switch
statements over the string values, with
an outer switch
over the length of the string. Something like this:
function isKeyword(word) {
switch (word.length) {
case 2:
switch (word):
case "if": case "in": case "do": return true;
}
return false;
case 3:
switch (word):
case "var": case "for": case "new": case "try": return true;
}
return false;
case 4:
/* et cetera */
}
}
I initially expected regular expressions to be faster (using the
test
method), but it turns out that they only are faster on Chrome,
and there only by a tiny margin.
But I wasn't about to write out all these boring predicates myself, so
I defined a function that, given a list of words, builds up the text
for such a predicate automatically, and then eval
s it to produce a
function.
Another big size (and probably speed) saving comes from the fact that
Acorn uses an operator precedence parser, whereas Esprima writes out
all the intermediate forms of binary operators in a long list of
functions named parseMultiplicativeExpression
,
parseAdditiveExpression
, parseShiftExpression
, and so on for all
of the ten precedence levels. Each of these has to be passed through
for each expression parsed.
I've set up a project page for Acorn, which is simply the output of docco run on the source code, and a github page for browsing the code and reporting bugs.
]]>Code editors take widely different approaches to the way syntax highlighting styles are defined. An elegantly simple approach is outlined in Patrick Walton's Common JavaScript Syntax Highlighting Specification, basically defining a state machine with regular expressions as its edges. Unfortunately, this proposal was never widely adopted. ACE uses a similar, though incompatible system. Other, more heavyweight, and often downright obscure, systems are found in Emacs, Vim, or Kate.
CodeMirror takes its own unconventional approach to mode definition. It grew more or less organically out of my fondness for crazy hacks at the time I was writing the first version of CodeMirror.
The original approach modeled a mode as a transforming iterator (iterator in the Python sense)—an iterator that lazily got its input from another iterator, and outputted some transformed form of this input. It took the characters in the document as input, and produced a stream of tokens. Inside this iterator, any amount of state and complexity could be hidden (the JavaScript mode was, and still is, almost a full parser).
To avoid having to re-parse the whole document every time a character was typed, such iterators had to support a copy operation, which basically extracted their state so that they could later be restarted on a new input iterator.
This was, all in all, a very neat abstraction, and also, from the outside, easy to work with. Unfortunately such iterators were quite hard to write, and the iterator abstraction that was used, which relies on exceptions to signal end-of-stream, had quite a high overhead—especially if you stack it several levels deep.
CodeMirror 2 addresses those two problems by separating the state from the iterator. Instead of, in order to maintain, copy, and restore state, having to perform weird tricks with closures, a mode author now simply defines a function that initializes the start-of-document state, and a function that takes such a state, along with a character stream, and advances the stream past one token, updating its state if necessary, and returning the style of that token.
This, while not quite as cute as the 'everything is an iterator' model, makes it much easier to write modes. Not just for people who aren't familiar with closures, but also for me. It also makes it much easier to write performant modes, because the abstractions don't call for quite as much indirection.
An example might make this set-up clearer. Here is a very simple mode that highlights only double-quoted strings (which may span multiple lines):
CodeMirror.defineMode("strings", function() {
return {
startState: function() {return {inString: false};},
token: function(stream, state) {
// If a string starts here
if (!state.inString && stream.peek() == '"') {
stream.next(); // Skip quote
state.inString = true; // Update state
}
if (state.inString) {
if (stream.skipTo('"')) { // Quote found on this line
stream.next(); // Skip quote
state.inString = false; // Clear flag
} else {
stream.skipToEnd(); // Rest of line is string
}
return "string"; // Token style
} else {
stream.skipTo('"') || stream.skipToEnd();
return null; // Unstyled token
}
}
};
});
Let's quickly walk through it. CodeMirror.defineMode
registers a
mode under a given name. It registers a constructor function to allow
configuring the mode when loading it (which this trivial mode doesn't
use).
The mode constructor returns an object containing the functions that
make up this mode. This one defines only startState
and token
,
others often define more, for example indentation
to derive the
correct indentation from a given state.
Our state only holds a single property, inString
. This is needed
because strings may span multiple lines, and CodeMirror tokens can't,
so when finding a string that continues to the next line we have to
communicate the fact that we're still in a string to the next call to
token
.
The token
function first handles the case where we are at the start
of a string (not currently in one, but right before a double quote
character). If so, it consumes the quote and sets the inString
flag.
Next, if we're in a string, we look for a closing quote on this line.
If one is found, we unset the inString
flag. Then we return
"string"
to indicate that the current token is a string.
If we're not in a string, we just skip to either the next double quote
or the end of the line, and return null
, to mark the token (the text
skipped over) as unstyled.
To be able to quickly and efficiently re-highlight a piece of text as
it is being edited, CodeMirror stores copies of state objects in its
representation of the document. Whenever a line needs to be
rendered, it looks for a state object in the lines before it, going
backwards until it finds one or hits the start of the document (at
which point it can call startState
to produce a state). If it has to
go too far back (current limit is 100 lines), it will, to save time,
simply take the line (within that limit) with the smallest
indentation, and assign a blank (starting) state to it.
Once it has a state, it runs the tokenizer over the lines that were skipped, feeding them that state object, which they will update. And finally, it runs the tokenizer over the line itself, and uses its output to assign CSS classes to the individual tokens.
Apart from the process described above, which is synchronous because it is needed to render a line (right now), there's another, asynchronous (background) parsing process that ensures the highlighting stays consistent. For example, a change at the start of a document could open a comment block or string, which would cause the whole document to be styled differently.
The background parser uses setTimeout
along with a check that bails
it out when it has been running for a given number milliseconds, to
act as a kind of background thread. It keeps a 'frontier', a point up
to which it knows the highlighting is consistent, which is adjusted
whenever the document is modified (or a new mode is chosen). As it
runs, parsing lines and assigning them a state, it adjusts this
frontier forwards.
To preserve memory, the background parser doesn't go beyond the end of the currently visible (scrolled into view) part of the document. That means that, if you open a huge document and never scroll down, no state is accumulated for the whole document. It also means that most background parsing runs are short, since there'll be no more than a few hundred lines between the part of the document that you're currently editing and the bottom of the visible view port.
The string highlighting example above could have been (slightly) more
succinctly written with a regular expression state machine. In fact,
it's would probably be a good thing for CodeMirror to come with a
wrapper that adapts such a state machine to CodeMirror's token
function interface. I've been waiting for a common standard
for such specifications to emerge, but not much progress seems to be
made there.
Still, lots of syntaxes have features that are difficult, painful, or even impossible to express in regular expressions. Take, for example, recognizing the difference between regular expressions and division operators in JavaScript code.
And, even better, when the syntax highlighter is a real program that runs over the code in a well-defined, complete way, you can do all kinds of neat clever things with it. For example, the JavaScript mode recognizes local variables, and colors them differently. The auto-completion demo fetches this information from the mode state and uses it to complete local variable names. The XML mode will highlight mismatched tags for you.
Because modes are simply tokenizers, with a very straightforward
interface, they can be run in different contexts. One example is the
runMode
add-on, which simply runs a tokenizer over a piece of
text and, through a callback, gives you back the tokens.
In fact, the syntax highlighting on this blog is powered by CodeMirror
modes, driven by a browserless node.js version of runMode
.
Another useful consequence of modular modes is that they are easy to
compose. For example the mixed HTML mode composes the
JavaScript, CSS, and XML modes (the latter has a configuration option
that makes it handle HTML). Internally, it initializes all three
modes, and when tokenizing, it multiplexes between them—feeding the
current input to the sub-mode that is active, and switching to a
different sub-mode when it encounters something that looks like a
script
or style
tag.
In fact, there's a utility shim, the mode multiplexer, that makes it easy to combine modes in such a way, when they are separated by fixed, context-independent strings.
Another, similar shim, the overlay mode, combines modes in a different way. It takes a base mode and an overlay mode, and runs both over the whole document, combining their styling information in the actual tokens it outputs. This can be used, for example, to highlight specific characters (say, tabs) in the editor, by overlaying a mode that simply finds such characters and assigns them a style. Or you could write a spell-checking overlay, which looks up each word it finds in a dictionary, and styles the ones it doesn't recognize.
Defining useful code-editor functionality often requires understanding the the context at a given position. For example, when matching brackets, you want to match brackets that fulfill the same role in the document. A brace in a string or comment should not match a 'real' brace that actually delimits a block of code.
For that, we simply take the token style of each brace we find as we are looking for a match, and compare it to the style of the brace we started from. Sometimes, it is useful to not only look at the token style, but also the parser state. I mentioned the example of auto-completing local variables earlier.
One area where this is used pervasively is smart indentation. All 'serious' CodeMirror modes keep context information—such as which blocks and parentheses are currently open—in their state objects. From that information, an indentation can easily (and reliably) be derived. Contrast the the terrifying regular expression hacks that some Emacs modes use to guess indentation levels—which often still get it wrong.
Another example is that, just last week, I was writing a system that needed to display a list of arguments below the editor whenever the user was typing an argument list for a known function. Figuring out that the cursor is in an argument list, and at which argument it is, would be a rather complex task, involving knowledge of brackets and quotes, if I had to do it myself. But I could simply rig the mode, which was already tracking contexts, to store that information in its state, and get easy (and fast) access to it whenever the cursor was placed in a new position.
]]>compareStates
is no longer needed.onHighlightComplete
no longer works.CodeMirror.version
property.See github for a full list of patches. Get the zip file from the website.
2.34 will be the last 'full' release on the 2.x branch. I will continue to bring out bugfix releases on that branch for at least two more months, but new work will, from now on, happen on version 3.
The first beta version of CodeMirror 3 also came out today. The jump to version 3 is mostly a result of some of the major work I did last month, that the community generously sponsored. Some of that work required incompatible API changes, and those changes landed in version 3 rather than the 2.x branch. The current beta has no known major problems (issue list for v3 milestone), but contains a lot of new code, and a serious overhaul of the old code, so I would not recommend using it in production yet.
I would be very thankful for any testing, of the editor in general and especially of the new features. I've written an upgrade guide that describes what changed, what's new, and how to adjust your code to it.
The zip file for version 3 is on the website, and its development takes place on the v3 branch in the git repository.
]]>The problem it tackles is this: you are writing a JavaScript control that needs to act as a text input field—it must be focusable, support copy and paste, receive typed input—but really isn't. I.e. you want to draw it yourself, and have full control over its content.
In this post, I won't talk about drawing a cursor, maintaining your own selection, and similar. Those are also required to present a convincing text input, of course. But they are relatively straightforward to implement.
The crux of my solution, the initial inspiration for which I got from
the ACE editor, revolves around a hidden textarea node.
This is the thing that the browser believes is focused when the
editor looks like it is focused. It'll behave like a regular focusable
object, you can assign a tabindex
to it, and will receive focus
and blur
events when it gains or loses focus, allowing us to update
the style of our editor (show/hide cursor, color/grey selection) to
reflect its focused state.
This textarea must, obviously, not be visible. The suspension of disbelief required for an editable control to feel real is completely ruined when there's a textarea sitting next to it, with its own blinking cursor.
However, if you give the textarea display: none
or visibility: hidden
, the browser decides that it is not really part of the
document and will refuse to focus it. CodeMirror gets around this by
wrapping the textarea (made small) in an overflow: hidden; height: 0
element (div
). That makes it both invisible and focusable.
(You'll also want an outline: none
style, to prevent some browsers
from helpfully showing a glow around it when focused, which is for
some reason not clipped by the overflow: hidden
.)
An other unfortunate (or fortunate, depending on your perspective) effect of a focused editing control is that the browser will scroll it into view every time it notices activity in it. This means that, if the hidden textarea simply sat at the top of the editor, and you had that top scrolled out of view because you were editing something near the bottom of the editor, your window would scroll up every time you typed a character.
CodeMirror absolutely positions the div
element it uses to hide the
textarea, and moves it around to always line up with the cursor. That
way, it actually helps scroll the real cursor into view.
When the user has selected some text, and performs a copy or cut action, the correct text should be placed on the clipboard.
This means that the selected text must be present in the textarea, and
selected. There are two approaches one can take here. The first, taken
by ACE, is to listen for copy
and cut
events (which are
fired before the actual copy or cut takes place), and only when such
an event is detected, insert the currently selected text into the
textarea and select it.
CodeMirror's approach is less clever, but more robust. It simply
always places the currently selected text into the textarea
(selected). The downside is that setting and getting a textarea's
value
property when it contains a lot of text is slow. If you
put a huge document into CodeMirror and press ctrl-A / cmd-A (select
all), there'll be a noticeable pause. (On some old browsers, depending
on the size of the document, it can actually look more like a browser
freeze than a noticeable pause.)
The advantage of this approach is that it works on Opera, which
doesn't fire copy
and cut
events, and that, on Linux, on some
browsers, it'll play nice with the X Windows selection clipboard (i.e.
middle-mouse-button paste). CodeMirror takes some care to minimize the
amount of textarea.value
traffic it produces, for example by not
updating the value during a selection drag, but only when the drag is
finished.
Update: It turned out to be easy to wire up CodeMirror to perform
the same trick as ACE—only putting in the whole selection when a cut
or copy happens—but only do it when the selection is actually big, and
we're on a browser that fires copy
and cut
events. For small
selections, the X Windows menu will still work, yet the pathological
case of select-all in a huge doc is only costly when the resulting
selection is copied.
So the hidden textarea contains the current selection, and has its content selected. That means that when the user types something, or pastes text, the content of the textarea will be the inserted text (overwriting the previous selection, if any), which can then be inserted into the real document at the cursor position.
But who will tell us when input happens? For a start, we can listen to
events like keypress
, paste
, input
, and even mouse
events. Those'll tell us that something might be about to go down. So
we set a timeout, and check the content of the textarea a few
milliseconds later.
But that isn't perfect. Opera doesn't fire paste
—and when you paste
from the menu, there also aren't any mouse events being fired.
Furthermore, IME input, on some browsers, can cause the content
of the textarea to be updated without any event being fired.
So we must poll too. And polling could get expensive, if we do it a lot and have to read the (potentially large) value of the textarea every time. Fortunately, we are helped by the fact that if the textarea has a big value (the selection), that value will be selected, and entering input would overwrite it. Thus, if the textarea has a selection (which is cheap to check), its value does not have to be read. This makes polling cheap, and allows CodeMirror to poll intensively when it is focused without eating up too many CPU cycles. (It stops polling when unfocused.)
I mentioned IME, Input Method Editor. I am not an expert on it, since I don't speak any language that requires it to be used. But, in brief, it allows people who write in scripts that have too many characters to fit on a keyboard to use sequences of key strokes to create characters. It usually operates by showing the current result of the composition in the editable control, and then replacing it with the updated result as more keys are pressed.
If CodeMirror were to clear the textarea every time it reads input, that would throw off partially finished IME input. So what it does is, when no selection exists, to leave the current input in the textarea, and store its value somewhere. Then next time when it polls, it compares the new value of the textarea to the previous value, discards the common prefix string, and uses what remains of the new value as the text to insert. If something also remains of the old value (after discarding the common prefix), that means that part of the previous value was replaced by new text, and the new input should replace those old characters in the document.
Modern browsers provide a weird but useful drag-and-drop API.
It is easy for an editing control to hook into this to support
dropping text into the editor and dragging text out of this. There are
a few subtleties. Here is CodeMirror's dragstart
handler:
on(node, "dragstart", function(e) {
// Set the dragged data to the currently selected text
e.dataTransfer.setData("Text", editor.getSelection());
// Use dummy image instead of default browsers image.
if (e.dataTransfer.setDragImage)
e.dataTransfer.setDragImage(document.createElement('img'), 0, 0);
}
The setDragImage
call, which in effect suppresses the default drag
image, is needed to prevent some browsers from showing the whole
editor being dragged around, because the outer element was set as
draggable=true
.
In CodeMirror's mousedown
handler, I also preventDefault()
clicks
that are not inside of the selection, so that dragging to create a
selection does not trigger text dragging. On Webkit, it is necessary,
in addition to that, to only set the draggable
attribute to true
when handling a mousedown
event that actually looks like a drag, and
setting it back to false afterwards.
The drop
event handler for an editor can do the song and dance
with a FileReader
to also handle files being dropped into the
editor.
As the icing on the cake, an editing control should behave properly when right-clicked. The context menu should contain working 'copy', 'cut', and 'paste' items.
Unfortunately, there is no API for hooking into context menus. You can capture the click and display your own menu, but that is very lame, and, what's worse, you won't have access to the clipboard so you can't even properly implement copy/paste functionality.
As usual in browser land, there's a horrible kludge to be found to
make up for the lack of APIs. In this case, we can respond to a mouse
click or contextmenu
event by briefly unhiding the textarea (giving
it a low opacity and no borders in order to not draw attention to it),
and placing it under the mouse cursor.
Since the textarea already contains the current selection, and, if it has a selection, its top left corner, which we place under the mouse cursor, will be where that selection is located, the browser now believes that we clicked on the textarea's selection, and will provide the menu items we want. Even if the the node is hidden again after a few milliseconds, the click will have been associated with it, and a subsequent paste will still be applied to our textarea.
One issue is that Firefox will fire the contextmenu
event after
opening the contextmenu, at which point it is too late to trick it
into believing the textarea was clicked. So on that browser, we
trigger the kludge from the mousedown
handler instead (given that
the right button was pressed).
A fourth item from the context menu that we'd like to support is 'select all'. To do this, we add a bogus space at the start of the content, which is not selected, and then poll for a while to see whether this space became selected. If it did (with the rest of the content still intact), we select everything in the editor. If something else changed about the textarea, or some amount of time went by, we give up.
For non-input keyboard events, such as cursor movement keys, CodeMirror simply handles the raw event itself and performs the appropriate operations on its internal selection representation.
The initial CodeMirror version 2 used a different approach, which was cute but in the end didn't work out. It put not only the selection, but also a few lines around that into the textarea, and left local cursor movement up to the browser. It would not just get input from the textarea, but also selection information.
This had the advantage of using the 'native' key bindings of the browser. It would work for custom key bindings, and outsource some of the complexity of selection handling to the browser.
It was abandoned because it required a lot of hacks to get working.
For example, you can't set the selection's anchor when setting a
selection on a textarea. The anchor is the side that does not move
when you press shift-left (or any other shift-motion). Browsers assume
it to always be the left side of the selection when the selection is
set through setting selectionStart
and selectionEnd
. To have the
selection behave properly when it was in fact inverted (the anchor was
the rightmost side) involved some painful and brittle kludges.
Additionally, it seems that very few people actually reconfigure the keyboard bindings in their browsers (browsers don't make it very easy to do so), and, interestingly, people were more interested in providing custom bindings for CodeMirror than in reconfiguring their browsers.
In the end, the extra complexity of handling our own keyboard events turned out to be less than the complexity that came with the approach outlined above. So CodeMirror moved to the model where the textarea holds only the selection, and never has to deal with cursor-motion key presses.
]]>I originally structured CodeMirror instances as one huge closure that contained all the internal variables. The constructor would create local variables for all internal state, and local functions for everything that needed access to that state, and then return an object that contained the API methods, which themselves also closed over all those internals. Something like this:
function CodeMirror(args) {
// Internal state
var doc = something, selection = somethingElse;
// Internal functions
function modifyDoc(from, to, newText) {
/* directly access doc, selection, etc */
}
function drawDoc() { /* ... */ }
return {
getLine: function(n) { return getLineFrom(doc, n); },
refresh: drawDoc
/* etc */
};
}
I had several reasons for doing it like this. Firstly, it minifies
well—local variables are very easy and non-invasive to rename, so if
most of your fields and functions are regular variables rather than
object fields, simple minification works very well. There are tools
like Google's Closure compiler in 'advanced' mode, which do try
to rename properties, but those are harder to use (they need to be
told which properties they may rename, and which are exported or come
from external component, such as .style.display
on a DOM node).
Secondly, it makes for uncluttered code—I can write foo
instead of
this.foo
or this.view.doc.foo
. That really does make a big
difference in the overall succinctness of the code.
Thirdly, and lastly, I believed that this would be faster. I reasoned thusly: a lexical variable in a closure is in static, known-at-compile-time place. The closure data is allocated as a structure with a known layout, and the closing function will have a pointer to that structure in a known place. Thus, a reference to a closed-over variable involves:
That sounds like two, three instructions at most, plus maybe two more to get at the current function's data. In case it's a nested closure, where the current function closes over an environment that in turn closed over the variable we're getting at, that'd add another hop, but still, nothing more than a simple following of pointers is involved.
Compare that to accessing an object field. This has been the target of much optimization work, since it used to be one of JavaScript's main bottlenecks, but it still requires a baseline amount of work. Assuming the JavaScript engine implements polymorphic inline caching, which all relevant ones do at this point, you'll still have to:
In my mind, this might come close to the speed of accessing a closed-over variable, but would definitely not surpass it.
However, benchmarks, both micro and a more elaborate one that I'll discuss in a moment, do show that on modern engines, object access is in fact faster than closure access.
I don't have an explanation for this. I'd be happy if someone can enlighten me on the subject. My current assumption is that the people working on these engines were just so busy optimizing object access and Sunspider performance that, unlike the compiler implementers in most other functional-language communities, closures just haven't been that well-optimized.
I spent the past few days on an experiment. I rewrote CodeMirror's core to use a set of objects rather than a single big closure for its central state. I have a version that passes the test suite now (though, since this change involved touching pretty much every single one of the almost 2000 lines that lived in that monster closure, there's probably some breakage that I haven't caught).
The reasons for this mostly had to do with modularity and code clarity. Closures are, in a way, too convenient. I can just define a new variable to hold some state, access it from everywhere, and it'll work. This is good, in most respects, when the system modelled by the closure is small (to medium) in size. But CodeMirror had long ago crossed the threshold where the proliferation of stateful local variables became hard to see through. Grouping them into a set of objects with well defined roles and lifetimes definitely made the data model easier to understand and reason about. It also, by lifting all internal functions out of the closure, forces me to specify the inputs that the functions act on in their argument lists.
The overhauled implementation did become noisier, obviously. Field
access had to be prefixed by object names all over the place, and it
is often necessary to create locals like var doc = cm.view.doc;
to
prevent typing out the chain of property accesses twenty times in a
function.
These are the file size numbers, in bytes, using UglifyJS for minification:
Full Minified Min+gzip
Closure 150453 65381 22733
Objects 154752 74655 24448
So the raw file became 2.7% bigger, the minified file 12.4% (!), and the gzipped file 7.0%. Zipping absorbs some of the extra size of the minified version, because the repeated property names are exactly the kind of thing that compression is supposed to handle well, but there still remains a significant bloat of the file size.
My conclusion: closures really do help with minification, but not to a huge enough extent to justify sticking to it for the CodeMirror project.
Finally, performance. I only did some ad-hoc benchmarks on Chrome 21 and Firefox 15, comparing the old closure-based CodeMirror codebase to the equivalent, object-using one. But those showed, on both browsers, a consistent speedup in the 2% to 3% range. Thus, my intuition about closure access being fast was once again debunked.
]]>The initial implementation of CodeMirror 2 represented the document as
a flat array of line objects. This worked quite well—splicing arrays
will require the part of the array after the splice to be moved, but
this is basically just a simple memmove
of a bunch of pointers, so
it is cheap even for huge documents.
However, in version 2.17 (November 2011), I added support for line wrapping and code folding. Once lines start taking up a non-constant amount of vertical space, looking up a line by vertical position (which is needed when someone clicks the document, and to determine the visible part of the document during scrolling) can only be done with a linear scan through the whole array, summing up line heights as you go. One of the design goals of CodeMirror is to make editing responsive even in huge document. So this is not an acceptable solution.
Operations that an effective document representation must be supported are looking up lines by line number, looking up lines by vertical position (for example, when figuring out where in a document a mouse click happened, or which lines are visible given a vertical scroll position), the reverse of those two operations (going to a line number or vertical offset given a line object). Furthermore, inserting and deleting lines, as well as updating the height of a line, should be cheap operations.
Anyone with a single computer science course under their belt will recognize this as a problem that calls for some sort of tree representation.
A regular binary tree would work. But the kind of update operations that we should be worried about are big ones—pasting a huge chunk of text, or selecting a few thousand lines and and then pressing delete. All balanced binary trees that I'm familiar with define only single-element insertion and deletion operations, which would have to be repeated a huge amount of times in the case of such mass updates.
We'd also prefer to keep tree depth to a minimum, because we'll be traversing this tree to find a line node or to update a line's parent nodes a lot—conversion between line numbers and line objects are rampant, because both describe essential properties of a line. (The number can not be stored in the line object, because that would require every single line object to be updated whenever someone presses enter near the top of the document.)
The new representation is based on a B-tree. These have the wide branching factor (and thus shallow depth) that we need, and lend themselves very well to bulk updates (more on that later).
The leaves of the tree contain arrays of line objects, with a fixed minimum and maximum size, and the non-leaf nodes simply hold arrays of child nodes. Each node stores both the amount of lines that live inside of them and the vertical space taken up by these lines. This allows the tree to be indexed both by line number and by vertical position, and all access has logarithmic complexity in relation to the document size.
Because both of these index keys (line number and vertical position) increase monotonically, a single tree can be indexed by both of them. This is great, it gives us the height index almost 'for free', with no additional data structure and only a very small amount of extra logic (maintaining the heights on updates).
Below is an illustration of what a tree might look like. This is a document of 50 lines, where the root node contains two children—one is branching chunk containing a number of leaf chunks, and the other is itself a leaf chunk. The first leaf has been written out, it contains seven lines, of which two are folded (taking up no height), and one is wrapped (taking up more height than a regular, unwrapped line).
root (the document) (size: 50, height: 470)
├─ chunk1 (size: 35, height: 300)
│ ├─ leaf1 (size: 7, height: 70)
│ │ ├─ line1 (height: 10)
│ │ ├─ line2 (height: 10)
│ │ ├─ line3 (wrapped, height: 30)
│ │ ├─ line4 (height: 10)
│ │ ├─ line5 (folded, height: 0)
│ │ ├─ line6 (folded, height: 0)
│ │ └─ line7 (height: 10)
│ ├─ leaf2 (size: 10, height: 110)
│ │ └─ ...
│ └─ ...
└─ leaf3 (size: 15, height: 170)
└─ ...
The size of the root node indicates the amount of lines that the document contains (and its height indicates the height of the whole document).
If we wanted to find line 12, we'd descend the root node, looking at its child chunk. The first child has size 35, so that's the one that contains line 12. Inside of this chunk, the first child is only of size 7, so we skip that, keeping in mind that we've seen seven lines, and the offset that remains is 12-7=5. The second chunk has size 10, which is more than 5, so we look inside that chunk. It is a leaf chunk, which means that its content is flat, and we can simply grab the line number five from inside of it.
For an interactive visualization of this tree, see this demo on the CodeMirror website.
The interface for deleting and inserting line objects in a tree is defined in terms of ranges of lines, rather than individual lines. To insert a range of size N at position P, we walk down the tree to find the leaf that contains position P. We then insert the whole range into the leaf. If this makes the leaf too big (there's a fixed maximum size defined for leaves), one or more new leaves will be split off from it, and inserted into its parent. If this, subsequently, makes the parent (non-leaf) chunk too big, that one is also split, and so on. If the root node needs to be split, a new root is created to hold the resulting chunks.
The beauty of B-trees is that this simple and cheap algorithm automatically balances the tree—when a branch grows, instead of growing downwards, its surplus population percolates upwards, towards the root, and causes the tree to grow from the root when it needs to. This is a not a perfectly optimal balance, as in some other kinds of trees, but it is definitely good enough for an editor implementation.
To delete a range of lines, the deletion simply recursively enters the branches that contains parts of the deleted range, and, in the leaf chunks, remove the relevant lines (updating size and height in the process). When a chunk becomes empty, it is simply removed completely, and when a branch chunk's size drops below a certain threshold, it is replaced by a flat leaf chunk. Again, this doesn't result in a perfect balance, but is wonderfully simply. In fact it doesn't even completely protect against pathological cases—there are editing patterns that can result in a seriously unbalanced tree. But those, since the unbalancing happens during deletion, can still only be as deep as the original tree (created by insertion, which has better balancing characteristics) was, and thus can't reach dangerous depths.
All line objects and tree nodes (except the root) have parent pointers to the node above them. This allows for very fast height updating—when a line changes, find the height delta, and simply walk its parent pointers adding this delta to their height—and finding the line numbers of a given line object.
Maintaining these links, and breaking them when lines are dropped from the document, is somewhat awkward, but having such logic in place turned out to be useful for other purposes as well. It allows CodeMirror to fire an event when a line is dropped from the document, gives an easy way to check whether a line is still live (it is when it has a parent pointer), and makes it more immediately obvious when a data structure is not being maintained consistently (the code will quickly try to follow a null pointer).
]]>I had heard that all the cool kids are now using Jekyll on their Github pages to publish their blogs. I am not keen on depending on Github for yet another aspect of my online life, but the idea of generating a static site from a git repository does sound appealing.
Setting up a simple site with Jekyll was a breeze. It really is a well-designed approach. But I also immediately ran into its limitations. Something as simple as sorting my list of tags by the amount of posts they contain was... apparently not possible without monkey-patching some classes from a plug-in.
Now I have all the respect in the world for the Ruby community and their anarchist approach to modularity, but such shenigans just don't fit my own sense of aesthetic. On the other hand, templating languages like Liquid, which Jekyll uses, are not nearly anarchist enough for my taste—they strictly forbid any kind of interesting logic to be placed in the template. I'm sure this is a good thing in some projects with some teams. But having to add code in another, largely unrelated place, just to be able to sort a stupid list in a certain way is not helping.
It turned out that, because of Jekyll's brilliant simplicity, cloning it was easier than figuring out how to monkey-patch it.
Several hours, and two hundred lines of code later, I present to you: Heckle, a node.js-based Jekyll clone.
It obviously doesn't have all the features of Jekyll—only the core
things that I needed to generate this site. It converts Markdown
files with a YAML front matter to HTML using templates from the
_layouts
and _includes
directories. It finds posts in the _posts
directory and understands what tags mean. And it copies all other
files in the working directory over to _site
, where the output ends
up, and which you can then point your web server at.
For templates, Heckle uses a modified version of Mold, which was designed for unrestrainedly mixing JavaScript logic into templates. It was originally designed for client-side instantiation, and I had to fix some limitations that made it work poorly with node.js. Those'll probably soon be integrated in the main repository.
The sources for my new blog are on Github. I imported some old posts to make it look less empty. With luck, I'll actually form a blogging habit this time around.
]]>Fortunately, toolkits and libraries are able to hide the horrors of combining characters, directionality, and word breaking most of the time. Today, most software has moved beyond the ASCII-only worldview, and makes at least an effort to handle these things properly. You throw strings at it, it displays them correctly for you.
But there are situations where that doesn't suffice. CodeMirror is a code editor implemented in JavaScript. It relies on the browser to display its content, and modern browsers are very good at displaying text. But it also displays a cursor, and controls its movement. To do that, it needs to be aware of some non-trivial properties of Unicode text.
In this article, I'll outline the solutions I came up with. I was able to find very little useful material on the subject online. It should be noted that I am in no way an expert in this field, and that I had to take a number of shortcuts to prevent the size and complexity of my editor library—which, as a JavaScript program, is downloaded by every user—within bounds. I am also not an Arabic or Hebrew speaker, and as such have very little experience with bi-directional editing interfaces. Remarks and corrections are very welcome.
Originally, CodeMirror assumed that each character in a line represented a glyph, and that these glyphs were shown left-to-right. This means that when, for example, the right arrow was pressed, the editor could simply move its cursor position one character towards the end of the string, and all was well.
But some Semitic scripts, notably Arabic and Hebrew, do not start writing on the left of the medium, but rather write right-to-left. Now if we had to deal only with lines that were entirely right-to-left (or left-to-right), that would be relatively easy—just move the cursor towards the start of the line when the right arrow is pressed, since a lower index represents a more rightward position in the visual representation of the line.
Unfortunately, things are not that easy. Firstly, there is nothing preventing people from mixing right-to-left and left-to-right scripts in a single line. Secondly, a group of digits ("Arabic numerals"—the ones we use in the West), when occurring in a piece of Arabic text, are to be rendered left-to-right, within their right-to-left context.
Let us look at an example. Assume that upper-case characters are Latin, and lower-case ones Arabic. If a string looks like this (logical order):
A B C a b c 1 2 3 d e D E (logical)
It is rendered like this (visual order):
A B C e d 1 2 3 c b a D E (visual)
The Arabic range (a
to d
) is flipped, and within that, the
number (123
) is flipped once more.
Deriving a visual order from a string isn't magic—there's a well-formalized algorithm for this published by the Unicode Consortium. It, in brief, proceeds by categorizing the characters in the string into categories like "Left-to-Right", "Right-to-Left Arabic", "Whitespace", and a number of other ones. It then performs a bunch of operations that reduce one category to another based on its context, for example reducing the category of "Non-spacing marks" to that of the character before it. Finally, when only a few categories remain, it builds up a visual order by 'flipping' sequences of characters with a right-to-left category, and within those, flipping sequences of digits back again.
I won't go any deeper into this algorithm. It is well documented. It in fact also declares a mechanism for inserting RTL and LTR marks, which explicitly control the direction of the text. CodeMirror's implementation does not currently implement this part of the algorithm.
The fact that bi-directional text has 'jumps' in it—positions where visually adjacent characters are not actually adjacent in the logical representation—has some interesting ramifications for editable text.
(Note that, though I am going to describe a behavior as if it were normative, this is just what most non-Windows software seems to be doing, and in fact there are other ways to handle bi-directional editing.)
When the cursor is at such a jump, for example at position 3 in the example string, as illustrated below, it defies some of the assumptions that underlie classical, single-direction cursor interfaces.
A B C a b c D E (logical)
0 1 2 3 4 5 6 7 8
(The numbers are the indices into the string that are used to represent cursor positions.)
When you type a D
(Latin letter) at position 3, all is well—a letter
is inserted to the left of the cursor, and the cursor moves to the
right to end up after the new letter. Same if you press backspace
there—the C
is simply deleted and the cursor ends up after the B
.
But, if you insert a character from a right-to-left script, say an
x
(which you should read as being an Arabic, right-to-left
character), you end up with the string ABCxabcDE
, and the x
will
appear, in the visual order ABCcbaxDE
, quite some distance from the
cursor. Similarly, when you press delete, you'll delete the a
rather
than the c
which is visually to the right of the cursor.
What Chrome does in such a situation, and what I've followed in CodeMirror, is to show a secondary cursor at the other end of the jump. So, visually, you'd see this, with the asterisks indicating the primary cursor and the plus sign the secondary one.
A B C c b a D E (visual)
* +
Now, at least you'll get a visual hint that something is not normal, and have a quick way to see where else your editing actions might take effect.
We will assume what we want the arrows on the cursor motion keys to match the direction that the cursor actually moves when you press them (this is not standard on Windows, where many programs move the cursor 'logically' when you press arrow-left and arrow-right, causing it to move in the opposite direction from the arrow when in right-to-left text).
To do this consistently, we define an ordering of (primary) cursor
positions. This ordering must have two properties: it must correspond
to the visual order of the line—i.e. a position more to the right in
this order is more to the right on the screen, and it must include
every possible cursor position in the string—it would be bad if there
were positions that you can't reach with the cursor keys. In regular
left-to-right text, this ordering is trivial. In a three-character
string, it would be 0123
(where 0
is before the first character,
and 3
is after the last). In a fully right-to-left string, it is
simply the inverse of that, 3210
. The fun starts with bi-directional
strings.
The cursor-position-order does not follow trivially from the character display order, because it talks about positions between characters. This includes assigning an ordering to jump positions. More concretely, here's an example. First, it shows the logical string, with its possible cursor positions labelled, and then below it, it shows the corresponding visual order and a possible ordering of character positions (the numbers refer to string offsets, their position reflects their ordering):
A B C a b c D E (logical)
0 1 2 3 4 5 6 7 8
A B C c b a D E (visual)
0 1 2 3 5 4 6 7 8
This'd mean that when you are at position 3
, pressing the right
arrow takes you to position 5
, and pressing it again takes you to
4
.
This ordering is mostly uncontroversial, except for the positions of
3
and 6
—we could also have flipped them, so that the user would
already be taken to position 6
(the leftmost end of the
right-to-left section) after pressing right from position 2
.
Whether either of these orders satisfies the 'corresponds to the
visual order' restriction depends on how we draw the primary cursor.
At position 6
, we could emphasize that it sits at the rightmost end
of the abc
right-to-left section, and draw it to the left to the c
,
or we could emphasize that it sits right before the DE
left-to-right
section, and draw it to the left of the D
.
Both work, but I've found that the least confusing behavior occurs when biasing cursor positions towards the dominant direction of the line (which CodeMirror defines to be the direction of the start of the line, but you could also base it on the percentage of characters that is right-to-left). So that means that in a line that starts with left-to-right text, when the cursor is on a jump point, the primary cursor is drawn relative to the character at the left-to-right side of the jump, and the secondary one relative to the right-to-left side.
Thus, in this schema, we'd reflect this bias by using the order shown
above, rather than the one where 3
and 6
are swapped (which would
amount to biasing towards the right-to-left text, which is not the
dominant direction of this line).
Depending on what you are doing, a display order can be represented in various ways. For cursor placement, drawing of the selection, and cursor motion, I found it most practical to use a format that lists the individual single-direction sections of text, from left to right in display order, and for each section tells me its direction, its start point, and its end point (in logical order offsets).
For cursor drawing, this allows us to find the section that the cursor is inside of, in which case it is simply drawn between the characters that are adjacent to it, or the two sections that it sits between. In that second case, we place the primary cursor relative the section whose direction corresponds to our dominant direction, and the secondary cursor relative to the other.
Selection drawing has to handle selections that look visually
discontinuous because of jumps. For example if, in the example string
that mixes numbers and right-to-left text, you select from position
1
(between A
and B
) to position 8
(between 2
and 3
), the
selection marker should cover the part shown by asterisks:
A B C a b c 1 2 3 d e D E (logical)
*************
A B C e d 1 2 3 c b a D E (visual)
*** *** *****
Drawing this is easily done by iterating over the sections that make up the line, and checking for each whether it overlaps the selection. If so, draw the part of the selection that falls inside the section by using coordinates relative to the section.
Finally, cursor movement, done in steps of one, starts by, just like cursor drawing, finding the section that the start position sits in or between. If it sits between sections, the section with the dominant direction is chosen as current section.
We then move one character in the intended direction. If we are in a
right-to-left section, this is the inverse of the specified direction
(i.e. left, which is normally -1
, towards zero, becomes 1
, towards
the end of the string).
If this motion takes us out of our current section, where 'out of' is defined as beyond the section's edge for sections of the dominant direction, and onto the section's edge for non-dominant sections, we need to skip to the next section (in the visual order), entering that one on the correct side (i.e. the visual right side when moving left, left side when moving right, where the offset corresponding to that side depends on the section's own direction). If the new section is non-dominant, we skip its edge, since that offset belonged to the origin section.
The above step may have to be performed multiple times, to allow moving through single-character non-dominant sections. It stops when we find a position that is actually inside the section that we are currently looking at.
So far, that's all more or less coherent. Unfortunately, there's a problem. Let us try to assign an ordering to a string that starts left-to-right and ends right-to-left:
A B C x y z (logical)
0 1 2 3 4 5 6
A B C z y x (visual)
0 1 2 3 5 4 ?
Because the second (zyx
) section isn't dominant, the positions on
its boundaries aren't biased towards it. Thus, cursor position 3
should obviously be placed after C
in the visual order. That leaves
only 6
, the one offset not assigned to any other position, for the
position at the end, marked with a question mark. But there is very
little sense in placing it there—at least, the algorithms described
above don't automatically do it.
As a kludge, I made the algorithm that produces the sections, whenever the last section's direction doesn't agree with the first section, insert an empty, zero-length section with the dominant direction at the end. This, being dominant, will, be associated with the position at the end of the string, and cause the ordering and cursor drawing to work out as hoped.
Another feature that was needed to make working with Hebrew text bearable is recognizing of combining characters.
If I write 'é', your browser will probably display that as an E with an acute (forward) accent, even though the source for this page actually contains two characters, first an 'e' and then Unicode point 769 (COMBINING_ACUTE_ACCENT). Such characters are rarely used in Latin languages, because Unicode point 233 (LATIN_SMALL_LETTER_E_WITH_ACUTE) fills the same role just fine in a single character. But in Hebrew (as well as several other languages), the combinations are so numerous that assigning a code point to every one isn't practical, and thus people actually use such combining characters.
When editing text with such combining characters, since only a single glyph is displayed for a series of one non-combining and N combining characters, the cursor will appear to stay in the same place when inside this series. This is very annoying, and it seems preferable to simply skip over the whole section in a single jump.
In Unicode terminology, the code points that are combining/continuing
characters are recognized by the Grapheme_Extend
derived core
property, as listed in this file. The amount of ranges listed
there is huge, so, as a crummy trade-of between correctness and code
size, I only took the ranges of a number of scripts (Latin, Hebrew,
Arabic) and made those into a big regular expression that the editor
can use to recognize continuing characters, leaving out a whole range
of other languages.
(Of course, The Right Thing would have been for browsers to expose a JavaScript API for getting Unicode character properties, since they internally have this information anyway—they need it to properly display text. I expect it'll probably be another five to ten years before such an API is considered important enough to standardize. Unicode adoption is a slow process.)
Having this regular expression, I simply make sure that cursor movement by keyboard or mouse always puts the cursor on the boundary of a visual glyph, never before a combining character.
While the arrow keys have a visual arrow on them suggesting a certain direction, backspace and delete imply the deleting of characters respectively before and after the cursor, where before and after are interpreted relative to the direction of the text. So in right-left-text, backspace will delete the character to the right of the cursor.
This means that determining the range to delete in response to these characters is done by looking at logical rather than visual positions. I also chose not to take combining characters into account when handling these, so that pressing backspace after 'é' (E + combining accent) will leave you with just 'e'—i.e. you delete characters, not glyphs.
As mentioned, being only a simple Dutch speaker exposed mostly to Western languages, I expect to be missing half of the subtleties of bi-directional editing. I will update this post with correction as they come in.
Regardless of that, I hope this write-up turns out to be useful to somebody. Figuring all this out without much guidance was a major time sink. I'd be glad to save someone else the bother.
]]>Support for both FFI bindings and talking to a wish
shell.
Have hardly any 'wrapper' functionality — you're directly driving a Tcl interpreter from Lisp.
They've only been used in one medium-sized project so far, but they are so simple that I'm rather confident they work as intended.
Project page at http://marijn.haverbeke.nl/cl-tk/.
]]>There already exists a comparable library called CL-JSON. I
originally wrote a new one because the way CL-JSON uses nil
to
encode all of boolean false, the empty list, and the empty object was
causing headaches, and later I added some other extensions.
Though it tends to get treated as one, HTTP is not just a dumb file-transfer protocol. It allows you, to a certain degree, to specify an intention with your requests (GET/POST, with PUT and DELETE thrown in if you really want to), it has a somewhat structured method for parameter passing, supports content negotiation, does authentication. But what I want to talk about here is caching.
Until recently, my experiences with caching had been mostly in the form of fighting against it. Any web developer will have experienced browsers (one of them in particular) inappropriately caching pages and scripts — causing users to see old content, or load broken scripts even after we fixed them. Typically, you then added some magical headers that you found through a random web-search and behold, the browser stops caching and all is well.
The reason browsers behave like this is, of course, that it is a lot of work to constantly fetch all these files. Not doing it saves bandwidth, and makes pages load quicker. When it works, this is decidedly a positive thing. The orginal vision for HTTP was for a more static world than today's web, where clients and proxies could happily cache content, dramatically reducing the load on the infrastructure. It did not work out that way, generally, since somewhere near the turn of the century the use of Perl and PHP to generate pages exploded, and every web-developer who was with it started to insist on making everything completely dyntamic. Some of the fads that were born then (visitor counters!) have since died out, but the idea of web pages I visit being tailored just for me (showing whether I'm logged in, what I'm allowed to do), and being derived from databases rather than static files, has become 'the way things are done'.
This dynamic approach is relatively easy to set up, and for systems that are not seeing heavy use it works perfectly well. Only when a site is being accessed heavily does the wastefullness of such an approach becomes apparent: You are building up a page from scratch — probably involving a bunch of calls to a database, some file accesses, potentially some expensive computation — for every single hit, even though the page returned is likely to be very similar to the one you generated 37 milliseconds ago for another request.
One solution is to use a system like memcached to cache chunks of data and fragments of pages on the server side. For a lot of situations, though, HTTP itself provides a well-specified and elegant model for caching content.
There are two forms of caching in HTTP: The expiration model and the validation model. In the first, the server includes a header in its response that tells the client how long the content stays 'fresh'. The client is allowed to use this content as long as it is fresh without checking back with the server. Using this with a long expiration period is very rarely what you want, since you more or less lose control over what the user is seeing, but does have the advantage that repeated accesses can be done without causing any server load. Sometimes useful for stuff that you know won't change, such as archived content.
The second model, validation, is more interesting. Here the server sends a piece of information identifying the version of the current content, either in the form of a last-modified date, or an 'entity tag' — an opaque identifying string, for example an MD5 hash of the content. On subsequent requests, the client may send a header indicating the version it currently has cached, and the server has the choice of sending back a response with status code 304 (not modified) if that version is still up to date. If it is not, you can proceed as normal and generate the proper content. Web servers typically do this automatically when serving static files, using the file-system's modification times.
To use expiration caching, you simply send along a header like this:
Expires: Tue, 21 Jul 2009 10:11:33 GMT
Note the convoluted date format. Your web library hopefully has functions for reading and writing timestamps in that format.
Validation caching requires you to add either a Last-Modified or an ETag to their responses, or both...
Last-Modified: Mon, 21 Jul 2008 08:32:35 GMT ETag: "xyzzy" Cache-Control: max-age=0, must-revalidate
(The Cache-Control headers tells the browser that is not okay to re-use its cached version of the content without asking the server whether it is up-to-date.)
Before responding, you determine the resource's current last-modified date or entity tag, and check for If-Modified-Since or If-None-Match headers. When an If-Modified-Since header with a date no older than the resource's modification date is given, you immediately respond with a 304 response, and do not have to generate your page. The same happens when an If-None-Match header is given that includes the current entity tag — though in this case, you have to make sure to re-send the ETag header along with the 304 response.
(For the fine print on any of this, consult the the HTTP 1.1 specification — which is relatively concise and readable, and more authoritative than a lot of the stuff that gets written about the subject online.)
The tricky aspect of this is, of course, reliably determining when a page should be considered modified. How this works depends entirely on the application. For a blog it can be relatively simple, for a complicated site full of dynamic widgets it might be impossible. If you take cacheability into account while designing your site, and avoid obvious things like showing the current time on the page, this doesn't have to be difficult. One useful trick is to have JavaScript take care of some dynamic aspects, such as showing the name of a logged-in user, and hiding controls that he or she does not have access to (though this does have some accessibility ramifications).
Just getting people's browser to cache stuff, while it can help, is hardly a life-saver. The beauty of the HTTP protocol is that if you do caching right, it is very simple to add your own proxy server in front of your server, and have it cache requests for multiple clients. The proxy will behave in almost the same way as a client, understanding cache-related headers and asking for modified content at the appropriate time, and is relatively easy to 'drop in' when load becomes an issue.
One likely way to screw up when using a proxy is being too liberal with your caching. If you do render the name of the logged in user in your HTML, you do not want your proxy to serve the page with Bob's name in it to Alice. And if you render a page showing a user's private info, such as credit card number (well, you should probably never do that, certainly not over non-secure HTTP), you clearly do not want that page to be cached and sent to someone else. There are a few more directives that can be given to the Cache-Control header for cases like this, and will be respected by any half-decent proxy program. 'private' indicates that the response is meant only for the current recipient, and that only that recipient should cache it. 'no-store' can be used to tell any and all caches to never store this response on disk. It is a good idea to add that whenever you are returning sensitive information that you do not want to accidentally end up on someone's hard disk.
Finally, for services that provide some kind of remote procedure call interface — XML-RPC, JSON, whatever, as long as it is HTTP — determining last-modified dates or entity tags right is often quite simple, since such requests tend to be more focused than web page requests. You could even use a proxied HTTP service internally in your server to access data that benefits from local caching, as an alternative to memcached.
]]>~:@
format construct to upcase parts of the
symbol's name. This works fine on standard CLs, but if you want to
write something that also works with Allegro's 'modern' mode (where
symbols are case-sensitive), you don't want to upcase the symbol. What
you do there is use the reader against itself — (format nil "~a-~a" :insert name-symbol)
,
the symbol-name
of :insert
will be whatever
the reader made of it, and thus you'll get a symbol that follows the
same conventions as the surrounding system.
(You also don't want to use uppercase strings in your package
definitions — I'm looking at you split-sequence
— use #:symbol
syntax if you don't want to waste memory on pointless keyword
symbols.)
In a similar vein, sometimes you'll want to create throwaway symbols
with a certain name at run-time. (For example, S-SQL requires
symbols for stuff like database index names, which you might want to
generate.) intern
leaks memory in this case, since anything interned
stays around until it is uninterned. gensym
tends to add junk to the
symbol's name. Some messing around with apropos (more languages need
an apropos feature) led me to the predictably named
make-symbol
, which, like #:
syntax, creates an
uninterned symbol with a specific name.