Architecture of the Corinthia Editor

Corinthia is an open source project aimed at developing a set of tools for editing and converting documents in various word processing file formats, possibly expanding to other classes of “office” documents in the future. Currently the only properly-supported file formats are HTML and Microsoft Word’s .docx (part of OOXML), with support for ODF on the way. To facilitate interoperability and integration with web technologies, HTML is used as the native file format. For the editor, this has the advantage of allowing us to build upon the rendering and scripting facilities of modern browser engines.

The editor described in this article is a library, not an application — there is no user interface and no way to use it directly. Instead, it is designed as a component that can be used in either web, desktop, or mobile applications. Separate efforts are underway within the Corinthia project to develop such applications, and you can also build your own if you wish. The code is licensed under the Apache Software License version 2.0, allowing it to be used in both commercial and open source products.

All of the code is written in TypeScript, which compiles down to JavaScript. The library is compatible with most major browser engines, including those of Safari, Chrome, Firefox, and Edge. By using HTML and the DOM APIs, the in-memory model used during editing directly mirrors the native file format. For non-native file formats such as Microsoft Word, the document is first converted to HTML at load time, and then converted back on save. This is handled by a separate library, which is not covered here.

This article gives an overview of how the editor works, primarily in terms of internal implementation details, although hopefully this information will also be helpful for those using the public API. I’ll avoid getting into details of the code, since the implementation, and to a lesser extent the API, is likely to change over time as the project evolves; my intention is just to introduce the fundamental concepts. For code-level documentation, consult the corinthia-editor repository on GitHub.

Isolation and the public API

The editor library enforces a clear separation between the public API and internal implementation details. The API provides operations like moving the cursor, inserting text, and changing formatting. It does not expose the DOM tree directly; this is considered an implementation detail, and the library relies on the assumption that it is in full control of the DOM. For this reason, the library code must be loaded and executed in its own context — for web applications, this requires an <iframe> element, and for native applications, a dedicated web view control. You should not load any custom or third-party scripts into the frame unless you really know what you’re doing, as these are highly likely to interfere with the operation of the library.

The API is designed to be usable by both JavaScript or TypeScript applications as well as native applications written in other languages for which a bridge is available. For example, in iOS, there is a method on the system-provided web view class that allows code written in Swift or Objective C to evaluate a string containing JavaScript code in the context of a web view, and receive the result as a string. To allow for simple, string-based invocation protocols like this, all public APIs exposed by the library deal solely with either primitive types or structured types that can be represented in JSON.

Functions exposed by the API are divided into categories such as formatting, selection, and outline. Library code is only executed when the application explicitly invokes one of these APIs — the library itself does not install any event handlers or timers. The application is responsible for intercepting mouse, keyboard, and touch events and invoking the appropriate APIs, as well as triggering any time-related actions such as flashing the cursor on and off.

Often, an API call will result in one or more events — notifications that may be of interest to the application. In the implementation code, these are added to a list of pending events during execution, and only when an API call has completed are these events dispatched. This ensures that if, in response to an event, the application wishes to take any actions that involve making further API calls, it can do so safely. The reason for this is that the library implementation is not designed with reentrancy in mind — that is, it is not designed to handle API calls being made while other API calls are already in progress.

For JavaScript or TypeScript applications, it is possible at initialization time to register a callback object that has methods for each type of event; these will be invoked just before an API call returns, but after the internal implementation of that API has completed execution. For other languages, it is possible to retrieve the list of events that have occurred as an array, which the application can then iterate through and respond appropriately to those in which it is interested. This event handling code is best implemented as part of the native language bindings, rather than being triggered directly from the rest of the application, to ensure events are responded to as soon as possible.

Cursor and Selection

One of the most fundamental elements of a text editor is support for a cursor — a position in the document at which text will be inserted or deleted, and a selection — a range encompassing all content between a start and end position, that indicates the text to which operations like deletions, replacements, and formatting changes should be applied. The Corinthia editor considers a cursor a special case of a selection — specifically, one that contains no text, and whose start and end positions are identical.

In the context of the DOM, we define a position as a (node, offset) pair. A position is always either between two items in a sequence of nodes or characters, or at the beginning or end of the sequence. For an element node, the offset indicates the index of the child node that the position precedes, or the number of child nodes if it follows the last child. For a text node, the offset indicates the index in the string value that the position precedes, or the length of the string value if it follows the last character. A range is a simply a pair of positions.

When performing a content-insertion operation (such as adding text, an image, or pasting content), the action depends on whether the selection is currently empty (a cursor). If not, the content within the selection is first deleted, causing the selection to become empty. Once this has been done, or if the selection was already empty, the new content is inserted at the current position.

In implementation terms, we use two classes, Position and Range. Every Position object has a node and offset property, and every Range object has a start and end property. Both classes provide methods to query and manipulate the respective objects.

Tracking Positions and Ranges

For a position or range to remain valid after changes have occurred in the DOM tree, it must be tracked while those changes are being made. Tracking keeps the position up-to-date so that it remains in the same relative location after the changes. For example, if a position is at the end of a text node, and one or more characters are deleted from the start, the offset must be decreased by the appropriate amount, so that it does not refer to an offset in the string that is greater than the length. Similarly, if a position is between two nodes, and one or more other nodes are inserted before it in the parent, the offset must be increased so it still remains between the same two nodes as before.

Enabling tracking for a position causes its node and offset to be updated in response to changes to the DOM tree whenever necessary. Usually only the offset changes, due to insertions or deletions of preceding siblings. Sometimes however, the node must also be changed, such as when the node is removed from the DOM tree; in such a case, the position will be updated to refer to the nearest node that still remains in the tree. Enabling tracking for a range is equivalent to enabling tracking for its start and end positions.

Position References

The Position and Range classes are internal to the implementation, because they directly reference DOM nodes, which the public API deliberately does not expose. Instead, applications deal with references to positions, which are opaque values obtained in response to various queries. Among these queries include asking for the start or end of the selection or the document, as well as nearby positions at specified offsets in various directions (up, down, left, and right) and granularities (character, word, line, sentence, paragraph, and document).

All position references remain valid only until a change is made to the document. The public API is divided into functions that change the document, such as inserting text or changing formatting, and those that don’t, such as retrieving formatting information or adjusting the selection. The positions corresponding to the references supplied to the application remain tracked up until this point. If an attempt is made to use a position reference that is no longer valid, an exception is thrown. When a change is made to the document, the application receives a “document modified” event, so it can invalidate or remove any position references it has in memory.

Undo & Redo

Undo can be implemented in terms of state or operations. In the former case, before each operation is performed, the entire state of the application’s data model is saved. A list of states is kept, and as the user steps backwards and forwards through the history (e.g. by pressing the undo and redo buttons), the entire state from each point in history is restored. In the latter case, a set of operations are defined on the data model, each with an inverse operation that “undoes” the operation. A list of operations is kept, and as the user steps backwards and forwards through the history, either the inverse operation is performed (for undo), or the original operation is performed again (for redo). The Corinthia editor uses the latter model.

When determining what operations to define, it is important to choose an appropriate level of abstraction — that is, how far removed each operation is from the actual manipulation of bits in memory. High-level operations involve lots of individual steps, making it a complex problem to define an inverse of the operation. Low-level operations involve only one or a few steps, making it a much simpler problem to define the inverse.

Our application state is modeled by the DOM tree, so all we need is core operations for inserting, deleting, and modifying nodes. In implementation terms, these simply call through to the underlying browser-provided API, and simultaneously record the necessary undo information in the form of an inverse operation. To undo an insertion of a node, we simply remove it from a tree. To undo a removal of a node, we insert it back at its previous location (for which we need to store the parent and next sibling). To undo the setting of an element’s attribute or a text node’s string value, we replace it with the old value. The simplicity of these actions, and the fact that almost everything we do is ultimately modeled in terms of these core DOM operations, makes it straightforward to represent the undo history and step back and forth through it.

One caveat to this is that all mutations of the DOM tree must go through our undo-aware operations, and none of the code in the editor can ever call the native APIs directly. Doing so would lead to unexpected results when stepping back through the history, leaving the document in an inconsistent state and possibly causing exceptions. If you are working on the codebase, you must therefore always use the wrapper functions that implement these operations, instead of calling methods like appendChild() or setAttribute() on node objects directly. The library has many internal utility functions which perform higher-level operations on the DOM tree which are themselves defined in terms of the lower-level core operations, and these automatically inherit their reversibility due to these use of these operations.

The other caveat is that almost all of the state is represented in the DOM — but there are a few cases where we store extra information, such as the outline structure, the set of words highlighted by the spellchecker, and the current selection. These pieces of state also have operations defined on them which are undo-aware, and must always be modified through those operations only, not directly. To the extent possible, the code enforces this by hiding the relevant data inside private variables.

For representing inverse operations in the undo history, we use closures — dynamically-created functions that capture the necessary values to pass to one of the core operations to undo the original operation. The undo history is organized into groups, each representing a set of operations that will be undone or redone each time the undo or redo API function is called (typically in response to an undo or redo button being pressed in the application UI). A group contains an array of functions which are executed in the correct order to carry out the changes in the appropriate direction.

Outline

The editor was originally designed for writing large, structured documents like reports and books. For texts like these, it is typical to divide the document up into chapters, sections, subsections and so forth, as well as having numbered figures and tables. Those familiar with LaTeX will most certainly have dealt with these concepts, as will Microsoft Word users familiar with the built-in heading styles. In HTML, the document structure is expressed using elements like <h1> and <h2>, which are analogues of LaTeX commands like \section and \subsection.

The editor provides explicit support for working with this structure via APIs and event notifications that deal with the document’s outline. This outline consists of a tree of sections, as well as a list of tables and figures, corresponding to the <table> and <figure> elements in HTML. Collectively, the sections, figures, and tables are referred to as outline items. Each has an id attribute set on its corresponding HTML element, which is used in the API to refer to a given item.

All outline items may have a text value and a number optionally associated with them. For sections, the text value is the title, and is required. For tables and figures, the text value is the caption, and is optional. Numbering, also optional, assigns a numeric prefix to the outline item’s text value; this prefix is calculated based on the section hierarchy, such as 2.3 for the third subsection within the second section.

API functions are provided for moving and deleting parts of the outline, enabling the application to provide an “outline view” or “document map” widget that lets the user navigate the document and manipulate its structure easily. The text value of an item can also set directly, making it possible to directly edit these from an outline widget, instead of the regular editing functionality.

Outline updates work in the other direction as well. Any time the title of a section or the caption of a figure or table changes, an event is dispatched to notify the application of the change, so it can update the UI accordingly. Similarly, a change to an outline item’s number due to an insertion, deletion, or move of an earlier outline item of the same type will also trigger an event.

References to outline items can also be inserted into the document. These appear as links in the HTML content, such that clicking on a reference takes the reader to the corresponding section, figure, or table. References include the number and/or text value of the outline item in the text of the link. This text is automatically updated whenever the referenced outline item changes, at the same time as the events are dispatched.

Finally, three types of front matter are supported: a table of contents, list of tables, or list of figures. These show all of the sections, tables or figures in the document, and are dynamically updated in the same way as references.

Conclusion

The overview given here covers the most but not all of the important concepts used in the Corinthia editor library. At the time of writing, code-level documentation is minimal, but this will change in the near future — hopefully by the time you read this. The most important things to keep in mind when modifying the library are maintaining a clear separation between the public API and the implementation, and always using the provided undo-aware functions for modifying the DOM tree, instead of doing so directly.

A lot more remains to be written about the library — and the Corinthia project in general, which I’ll address in future posts. Right now (as of March 2016) we have a small community involved, but are keen to hear from others interested in portable, web-based office applications, since we believe there is a big gap to be filled by open source in a space that is currently dominated by large, commercial players. For a general introduction to the project as a whole, see the talk I gave at ApacheCon EU 2014. I also encourage you to join our mailing list if you have any questions or thoughts to discuss.