HTML5, LegisPro Web, Standards, Track Changes, W3C

Building a browser-based XML Editor

Don’t forget the 2014 U.S. House Legislative Data and Transparency Conference this week.

I’m now hard at work on our second generation web-based XML editor. In my blog last week, I talked about the need for and complexities of change tracking in a legislative editor. In this blog, I want to describe more of the overall motivation.

A couple years ago, we built an HTML5-based legislative editor for Akoma Ntoso. We learned a lot from the effort and had some success with a couple customers whose needs matched the capabilities of the editor. The editor was built to use and exploit, to the fullest extent, many of the new APIs added to modern browsers to support HTML5. We found that, by focusing on HTML5, a lot of the complexities of dealing with browser quirks and incompatibilities were a thing of the past – allowing us to focus on building the editing functions.

The editor worked by transforming the XML document into a close representation of the XML, expressed as HTML5 tags. Using HTML5 features such as the @contenteditable attribute along with modern CSS, the browser DOM, selection ranges, drag and drop, and a WebDAV repository API, we were able to implement a fairly sophisticated web-based legislative editor.

But, not everything went smoothly. The first problem involved the complexity of mapping all the intricacies of XML into an HTML5 representation, and then maintaining that representation in the browser. Part of the difficulty stems from the fact the HTML5 is not specifically an XML dialect – and browsers tend to do HTML5 things that aren’t always XML friendly. The HTML5 DOM is deliberately rather loose and forgiving (it’s a big part of why HTML was successful in the first place) while XML demands a very precise and rigid DOM.

The second problem we faced was scalability. While the HTML5 representation wasn’t all that heavyweight, the bigger problem was the transformation cost going back and forth between HTML5 and XML. We sometimes deal with very large legislation and laws. In our bigger cases, the cost of transformation was simply unreasonable.

So what is the solution? Well, early last year we started experimenting with using a browser to render XML documents with a CSS directly – without any transform into HTML. Most modern browsers now do this very well. For the most part, we were able to achieve an acceptable rendition in the browser without any transformation.

There were a few drawbacks to this approach. For one, links were dead – they didn’t inherently do anything. Likewise, implementing something like the HTML @style attribute didn’t just naturally work. Before we could entertain the notion of a pure XML-based editor built within the XML infrastructure in the browser, we had to find a solution that would allow us to enrich the XML sufficiently to allow it to behave like an HTML page.

Another problem arose in that our prior web-based editor relied upon the @contenteditable feature of HTML. That is an HTML feature rather than a browser feature. Using XML as our base environment, we no longer had access to this facility. This wasn’t a total loss as our need for a rich change tracking environment required us to find a better approach that @contenteditable offered anyway.

With solutions to the major problems behind us, we started to take a look at the other goals for the editor:

  • Track Changes – This was the subject of my blog last week. For us, track changes is crucial in any editor targeted at legislation – and it must work at both the structural and textual level equally well. We use the feature for two things – redlining changes as is common in the U.S. and the automatic generation of amendment documents (amendments in context). Differencing can get you part way there – but it excludes the ability to adequately craft the changes in a way that deal with political sensitivities. Track Changes is a very complex feature which must be built into the very core of the editor – tacking it on later will be very difficult, if not impossible.
  • Scalability – Scalability is very important to our applications. We need to support very large documents. Even when we deal with document fragments, we need to allow those fragments to be very large. Our approach is to create editing islands within a large document loaded into the browser. This amounts to only building the editing superstructure around the parts of the document being edited rather than the whole document. It’s like building the scaffolding around only the floors being worked on in a skyscraper rather than trying to envelope the entire building in scaffolding.
  • Modularity – We’re building a number of very different applications currently – all of which require XML editing. To allow this variability, our new XML editor is written as a web-based component rather than a full-fledged application. Despite its complexity, on the surface it’s deceivingly simple. It has no user interface at all aside from the editing canvas. It’s completely driven by a well thought out JavaScript API. Adding the editor to a document is very simple. A single link, added to the bottom of the XML document, adds the editor to the document. With this component, we’re able to include it within all of the applications we are building.
  • Configurability – We need to support a number of different models – not just Akoma Ntoso. To achieve this, an XML-based configuration file is used to define the behaviors for any XML model. Elements can be defined as read-only, templates can be defined (or derived), and even the track changes behavior can be configured for individual elements. The sophistication being defined within the configuration files is to allow us to model all the variants of legislative models we have encountered without the need for extensive programming-level customization.
  • Browser Support – We’re pushing the envelope when it comes to browser support. Our current focus is on Google’s Chrome browser. Support for all the browsers aside from Internet Explorer should be relatively easy. Our experience has shown that the browsers are now quite similar. Internet Explorer is the one exception – in this particular area. Years ago, IE was the best browser when it came to XML support. While IE had many other compatibility issues, particularly with CSS, it led the way in supporting XML. However, while Microsoft has made tremendous strides moving forward to match the other browsers and modern standards, they’ve neglected XML. Their circa 1999 legacy capabilities for XML do no match modern standards and are quite deficient. Hopefully, this is something that will soon be rectified.

It’s not all smooth sailing. I have been finding a number of surprising issues with Google Chrome. For instance, whitespace management is a bit fudged at times. Chrome thinks nothing of adding the occasional non-breaking space to maintain whitespace when editing the DOM. What’s worse – it will inexplicably convert this into a text node that reads ” ” after a while. This is a character entity that is not defined in XML. I have to work hard to constantly reverse this odd behavior.

All in all, I’m excited by this new approach to building a web-based XML editor. It’s a substantial increase in sophistication over our prior web-based XML editor. This editor will be far more robust, scalable, and configurable in comparison to our prior editor and other editors we have worked on. While we still have a way to go in our development, we’ve found solutions to all the risky issues. It’s a future-looking approach – support can only get better. It doesn’t rely on compatibility modes or any other remnants of prior eras in web technology. This approach is really working out quite nicely for us.

Standard
LegisPro Web, Track Changes

Tracking Changes with Legislative Drafting

We’re in the process of rebuilding our legislative editor – from the ground up. There are many reasons why we are doing this, which I will leave to my next blog. Today, I want to focus on the most important reason of all – change tracking.

Figure 1: The example above shows non-literal redlining and two different change contexts. An entire bill section is being added – the “action line” followed by quoted text. Rather than showing the entire text in an inserted notation, only the action line is shown. The quoted text reflects a different change context – showing changes relative to the law. In subsequent versions of this bill, the quoted text will no longer show the law as its change context but rather the prior version. It’s complicated!

For us, change tracking is an essential feature of any legislative editor. It’s not something that can be tacked on later or implemented via a customization – it’s a core feature which must be built in to the base editor from the very outset. Change tracking dictates much of the core architecture of the editor. That means taking the time to build in change tracking into the basic DOM structures that we’re building – and getting them right up front. It’s an amazingly complex problem when dealing with an XML hierarchy.

I’ve been asked a number of questions by people that have seen my work. I’ll try to address them here:

Why is change tracking so important? We use change tracking to implement a couple of very important features. First of all, we use it to implement redlining (highlighting the changes) in a bill as it evolves. In some jurisdictions, particularly in the United States, redlining is an essential part of any bill drafting system. It is used to show both how the legislation has evolved and how it affects existing law.

Secondly, we use it to automatically generate “instruction” amendments (floor or committee amendments). First, page and line markers are back-annotated into the existing bill. That bill is then edited to reflect the proposed changes – carefully crafting the edits using track changes to avoid political sensitivities – such as arranging a change so as not to strike out a legislator’s name. When complete, our amendment generator is used to analyze the redlining along with the page and line markers to produce the amendment document for consideration. The cool thing is that to execute the amendments, all we need to do is accept or reject the changes. This is something we call “Amendments in Context” and our customer calls “Automatic Generation of Instruction Amendments” (AGIA).

How is legislative redlining different from change tracking in Word? They’re very similar. In fact, the first time we implemented legislative redlining, we made the mistake of assuming that they were the same thing. What we learned was that legislative redlining is quite a bit more complex. First of all, the last version of the document isn’t the only change context. The laws being amended are another context which must be dealt with. This means that, within the same document, there are multiple original sources of information which must be compared against.

Secondly, legislative redlining has numerous conventions, developed over decades, to indicate certain changes that are difficult or cumbersome to show with literal redlining. These amount to non-literal redlining patterns which denote certain changes. Examples include showing that a paragraph is being merged or split, a provision is being renumbered, a whole bill is being gutted and replaced with all new text, and even that a section, amending law (creating a different change context), is being added to a new version of the bill.

The rules of redlining can be so complex and intricate that they require any built-in change tracking mechanism in an off-the-shelf editor to be substantially modified to meet the need. Our first legislative editor was implemented using XMetaL for the State of California. At first, we tried to use XMetaL’s change tracking mechanisms. These seemed to be quite well thought out, being based on Microsoft Word’s track changes. However, it quickly became apparent that this was insufficient as we learned the art of redlining. We then discovered, much to our alarm, that XMetaL’s change tracking mechanism was transparent to the developer and could not be programmatically altered. Our solution involved contracting the XMetaL team to provide us with a custom API that would allow us to control the change tracking dimension. The result works, but is very complex to deal with as a developer. That’s why they had hidden it in the first place.

Why can’t differencing be used to generate an amendments document? We wondered this as well. In fact, we implemented a feature, called “As Amends the Law” in our LegisWeb bill tracking software using this approach. But, it’s not that straight-forward. First of all, off-the-shelf differencers lack an understanding of the political sensitivities of amendments. What they produce is logically correct, but can be quite politically insensitive. The language of amendments is often very carefully crafted to not upset one side or another. It’s pretty much impossible to relay this to a program that views its task as simply comparing two documents. Put another way, a differencer will show what has changed rather than how it was changed.

Secondly, off-the-shelf differencers don’t understand all the conventions that exist to denote many of the types of amendments that can be made – especially all the non-literal redlining rules. Asking a legislative body to modify their decade’s old customs to accommodate the limitations of the software is an uphill battle.

What approaches to change tracking have you seen? XMetaL’s approach to change tracking is the most useful approach we’ve encountered in XML editors. As I already mentioned, its goal is to mimic the change tracking capabilities of Microsoft Word. It uses XML processing instructions very cleverly to capture the changes – one processing instruction per deletion and a pair for insertions. The beauty of this approach is that it isolates the challenge of change tracking from the document schema – ensuring wide support for change tracking without any need to adapt an existing schema. It also allows the editor to be customized without regard for the change tracking mechanisms. The change tracking mechanisms exist and operate in their own dimension – very nicely isolated from the main aspects of editing. However, when you need to program software in this dimension, the limited programmability and immense complexity becomes a drawback.

Xopus, a web-based editor, tries to mimic XMetaL’s approach – actually using the same processing instructions as XMetaL. However, it’s an apparent effort to tack on change tracking to an existing editor and the result is limited to only tracking changes within text strings. They’ve seemingly never been able to implement a full featured change tracking mechanism. This limits its usefulness substantially.

Another approach is to use additional elements in a special namespace. This is the approach taken by ArborText. The added elements (nine in all), provide a great deal of power in expressing changes. Unfortunately, the added complexity to the customizer is quite overwhelming. This is why XMetaL’s separate change dimension works so well – for most applications.

Our approach is to follow the model established by XMetaL, but to ensure the programmability we need to implement legislative redlining and amendment generation. In the months to come, I will describe all this in much more detail.

Standard
Akoma Ntoso, HTML5, LegisPro Web, Standards, Track Changes, Transparency, Uncategorized, W3C

Legal Citations and XML Editing for Legislation

It’s been quite some time since my last blog post – almost six months. The reason is that I’ve been very busy. We are doing a lot of exciting development within Xcential. We are developing a number of quite challenging projects around the globe.

If you’ve been following my blog, you may remember that I was working on an HTML5-based XML editor. That development was two years ago now. We’ve come a long way since then. The basic editor has been stripped down, componentized, and has being rebuilt to be a far more robust, scalable, and adaptable solution. There are more details below, which I will expand upon as the editor rolls out over the next year.

    Legal Citations

It was almost a year ago since the last Legislative Data and Transparency Conference in Washington D.C. (The next one is coming up) At that time, I spoke about the need for improved citation management in published XML documents. Well, we’ve come a long way since then. Earlier this year a Technical Committee was formed within OASIS to begin developing some standards. The Legal Citation Markup Technical Committee is now hard at work defining markup models for legal citations. I am a member of that TC.

The reference management part of our HTML5-based editor has been separated out as a separate project – as a citation interpreter and reference resolver. In our development tests, it’s integrated with eXist as a local repository. We also source documents from external sources such as LII.

We now have a few citation management projects underway, using our resolver technology. These are exciting projects which will be a huge step forward in improving how citations are managed. It’s premature to talk about this in any detail, so I’ll just leave this as a teaser of stuff to come.

    XML Editing for Legislation

The OASIS Legal Document ML Technical Committee is getting ready to make a large announcement. While this progress is being made, at Xcential we’ve been hard at work refining the state-of-the-art in XML editing.

If you recall the HTML5-based editor for Akoma Ntoso from a couple of years back, you may remember that is was based around all the new HTML5 technologies that have recently been incorporated into web browsers. We learned a lot from that effort – both good and bad. While we were able to get a reasonable tagging editor, using facilities that made editing far easier, we still faced difficulties when it came to basic XML editing and scalability.

So, we’ve taken a more ambitious approach to produce a very generalized XML editing platform. Using what we learned as the basis, our new editor is far more capable. Rather than relying on the mapping of XML into an equivalent HTML5 structure, we now directly use the XML facilities that are built into the browser. This approach is both far more robust and far more scalable. But the most exciting aspect is change tracking. We’re building change tracking directly into the basic editing engine – from the outset. This means that we can track all changes – whether the changes are in the text or in the structure. With all browsers now correctly implementing the standardized DOM Range model, our change tracking model has to be very sophisticated. While it’s hellishly complex, my experience in implementing change tracking technologies over many years is really coming in handy.

If you’ve used change tracking in XMetaL, you know the limitations of their technology. XMetaL’s range selection constrains how you can select which limits the flexibility of deletion. This simplifies the problem for the XMetaL customizer, but at a serious usability price. It’s one of the biggest limiting factors of XMetaL. We’re dealing with this problem once and for all with our new approach – providing a great way to implement legislative redlining.

Redlining Take a look at the totally contrived example on the left. It’s admittedly not a real example, it comes from my stress testing of the change tracking facilities. But look at what it does. The red text is a complex deletion that spans elements with little regard to the structure. In our editor, this was done with a single “delete” operation. Try and do this with XMetaL – it takes many operations and is a real struggle – even with change tracking turned off. In fact, even Microsoft Word’s handling of this is less than satisfactory, especially in more recent versions. Behind the scenes, the editor is using the model, derived from the schema, to control this deletion process to ensure that a valid document is the result.

If you’re particularly familiar with XMetal, you will notice something else too. That deletion cuts through the structure of a table!!!! XMetaL can only track changes within the text of table cells, not the structure. We’re making great strides towards proper legislative redlining technologies, and we are excited to work with our partners and clients to put them into practice.

Standard