Uncategorized

Comparing DOCX to Akoma Ntoso for Legislation

After describing what makes for good legislative XML, I feel I should bring up a favorite topic of mine — why word processors don’t make for good legislative drafting tools.

Lately, we’ve been implementing round tripping tools to allow Akoma Ntoso documents to be imported and exported from Microsoft Word. This is to facilitate migration from a largely office productivity-oriented system to an XML-based one and to allow the exchange of documents with external clients that don’t have access to the internal systems being used to draft and manage legislation. It’s been quite a difficult process. The round-tripping itself has been quite straight forward. Exporting a document is relatively easy and reimporting that exported document, unchanged, isn’t difficult. What is very problematic is trying to ingest documents drafted or extensively edited using a word processor. The DOCX markup quickly becomes a tangled mess. Even when a document looks fine visually, there can be a lot going wrong on the inside, revealing the drafter’s struggle with the word processor to get a document that at least looks right. To avoid the problematic mess, we tend to resort to interpreting the words and discarding the structure and internal metadata entirely. It’s not perfect, but it’s at least manageable.

I’m going to compare the prominent word processing format today, DOCX (well, at least the WordprocessingML part of it) to Akoma Ntoso in respect to how they stack up to each other on my list:

Is it semantic?
DOCX: No, not at all. DOCX is a serialization of the inner workings of Microsoft Word. It makes no attempt to be anything else.
Akoma Ntoso: Yes, this is the fundamental approach Akoma Ntoso takes.
Is the presentation separated from the semantics as much as possible?
DOCX: No, the presentation is tied directly into the document itself, and what’s more, is very proprietary.
Akoma Ntoso: Yes, although you can apply presentation directly inline in cases, such as tables, where necessary.
Is all the text (excluding any metadata section) in the natural reading order?
DOCX: Yes, for the most part.
Akoma Ntoso: Yes, for the most part.
Does it, to the fullest extent possible, avoid the use of generated text?
DOCX: No, and this is one of the most frustrating and infuriating parts of working with DOCX.
Akoma Ntoso: Mostly, but it doesn’t preclude practices that ensure this rule is followed.
Is every provision that needs data associated with it permanently identifiable?
DOCX: Mostly.
Akoma Ntoso: Yes, via the @wId or the @GUID attributes.
Is every provision that is referred to easily locatable?
DOCX: Not without extensive customization.
Akoma Ntoso: Yes, via a standardized locator mechanism using the @eId/@wId attributes.
If the XML schema is for general use, is there an extensible way to add missing constructs?
DOCX: No, unless you regard styling as your constructs (a bad idea) or want a complex customization task.
Akoma Ntoso: Yes, via the seven elements found in the generic model.
Is there an extensible metadata mechanism?
DOCX: Yes, but it’s complicated.
Akoma Ntoso: Yes, but it’s complicated.
Does it provide the facilities necessary to automate according to modern expectations?
DOCX: No, the presentation oriented structure of DOCX does little to enable downstream automation.
Akoma Ntoso: Yes, Akoma Ntoso encourages a hierarchical content structure that is ideal for downstream automation.

Of course, Akoma Ntoso looks a lot better for legislative documents than does DOCX files. That should be no surprise — Akoma Ntoso is purpose-built for legislation while DOCX is a general purpose document model intended for no single purpose. But it is also fundamentally very different. While Akoma Ntoso is designed to be in modern standards-based document information model for legislation, DOCX is a serialization of the archaic data structures that exist within Microsoft Word. DOCX reflects the proprietary inner workings of Microsoft Word rather than the semantic meanings to be found within a document.

Akoma Ntoso has its drawbacks too. It’s complex, a bit academic, and has to span a very broad range of legal traditions make it a good fit for most legislative traditions, but a perfect fit for none.

Akoma Ntoso, Standards, technology, Uncategorized

What is Good Legislative XML?

I’m often asked what make on XML model better than another when it comes to representing laws and regulations. Just because a document is modeled in XML does not mean that it is useful in that form — the design of the schema matters in terms of what it enables or facilitates.

We have a few rules of thumb that we apply when either designing or adopting an XML schema:

Is it semantic?
Reason: In order to process the information in a document, you have to understand what it is and what it means.
Is the presentation separated from the semantics as much as possible?
Reason: We have moved beyond paper and nowadays it’s important to present information in form factors that just don’t suit the legacy constraints imposed by printing paper.
Is all the text (excluding any metadata section) in the natural reading order?
Reason: The simplest way to present and process the text in a document is in the reading order of the text. This is particularly important is the presentation is to be added to the XML using simple CSS styling (as opposed to HTML transformation) and when the text is subject to complex amending instructions.
Does it, to the fullest extent possible, avoid the use of generated text?
Reason: Similar to the last rule, it’s important for text to be displayed or amended when that text is represented. Generating text opens up a can of worms which can require sophisticated additional processing. Also, from a historical record of the text, which is essential for enacted law, having part of the text be generated by an external algorithm requires that the algorithm itself become part of the permanent record.
Is every provision that needs data associated with it permanently identifiable?
Reason: With modern automation comes the need to not only manage the text of a provision but also state information. For example, is the current status of the provision pending, effective, repealed, or spent? While some of the metadata might be stored with the XML representation of the provision itself, sometimes it is better to store that metadata in a separate part of the document or in an external database. In these cases, it’s important to be able to permanently associate this external metadata with the provision — and this usually requires an immutable (permanent) identifier.
Is every provision that is referred to easily locatable?
Reason: Laws are full of references (or citations). These are to provisions within the same document or to other documents or provisions within those documents. There needs to be a way to accurately and efficiently traverse and process these references. This need usually requires a locating identifier that an unambiguously identify the provision being referred to.
If the XML schema is for general use, is there an extensible way to add missing constructs?
Reason: It is easy to claim to support all the legal traditions in the word, but extremely difficult to do so. While legal traditions are remarkably similar around the world, it’s impossible to predict every single construct that will arise — especially with documents data back hundreds of years. There has to be a way to implement constructs that don’t intrinsically exist within the base XML schema.
Is there an extensible metadata mechanism?
Reason: A primary objective for representing a legislative or regulatory document in XML is for the processability it enables. This invariably means a need to record extensive metadata about the provisions found within the document. As the automation possibilities are endless, there needs to be a way to model and record the metadata that is generated.
Does it provide the facilities necessary to automate according to modern expectations?
Reason: Some structure facilitate automation while others do not. For instance, flat structures can simplify the drafting process, but also make the automation process more difficult. It’s usually better to implement hierarchical structures and then hide the drafting complexity that creates with richer tools.

Uncategorized

LegisPro edit will soon be ready for beta!

Our new rulemaking LegisPro^edit is coming along nicely. It’s a web-based XML drafting tool specifically designed for the rigors of rulemaking tasks such as legislative bill drafting. It supports both the Akoma Ntoso and the USLM legislative models and can be customized to support any other model if necessary.

This past week I gave a demonstration of it at the LEX US Summer School at George Mason University in Washington D.C. With trepidation, I allowed everyone to have a hands-on experience with it as I provided guidance. This was the first time the editor had been used by anyone outside of Xcential and the first time we had stressed server performance. While certainly not glitch free, the editor exceeded my expectations for this point in the development process and all went well. It worked!

This week we are talking about the editor at NCSL by way of a screenshot demo that I am sharing here:

The next opportunities to try the editor hands-on will be at the LEX Summer School in Ravenna, Italy next month and we will also be showing it later in the month at NALIT in Sacramento, California.

The QuickStarter beta program is still in the process of being finalized. We are currently envisioning different levels of participation, from basic beta testing to a full-fledged evaluation program for anyone looking to use it or a part of it in an upcoming project.

More information can be found at http://xcential.com/legispro or you can contact us at info@xcential.com.

Akoma Ntoso, HTML5, LegisPro Web, LEX Summer School, Standards, Transparency

Data Transparency Breakfast, LEX US Summer School 2015, First International Akoma Ntoso Conference, and LegisPro Edit reveal.

Last week was a very good week for my company, Xcential.

We started the week hosting a breakfast put on by the Data Transparency Coalition at the Booz Allen Hamilton facility in Washington D.C.. The topic was Transforming Law and Regulation. Unfortunately, an issue at home kept me away but I was able to make a brief pre-recorded presentation and my moderating role was played by Mark Stodder, our company President. Thank you, Mark!

Next up was the first U.S. edition of the LEX Summer School from Italy. I have attended this summer school every year since 2010 in Italy and it’s great to see the same opportunity for an open dialog amongst the legal informatics community finally come to the U.S. Monica Palmirani (@MonicaPalmirani), Fabio Vitali, and Luca Cervone (@lucacervone) put on the event from the University of Bologna. The teachers also included Jim Mangiafico (@mangiafico) (the LoC data challenge winner), Veronique Parisse (@VeroParisse) from the European Union, Andrew Weber (@atweber) from the Library of Congress, Kirsten Gullickson (@GullicksonK) from the Office of the Clerk at the U.S. House of Representatives, and myself from Xcential. I flew in for an abbreviated visit covering the last two days of the Summer School where I covered how the U.S. Code is modeled in Akoma Ntoso and gave the students an opportunity to try out our new bill drafting editor — LegisPro^edit.

After the Summer School concluded, it was followed by the first International Akoma Ntoso Conference on Saturday, where I spoke about the architecture of our new editor as well as how the USLM schema is a derivative of the Akoma Ntoso schema. We had good turnout, from around the world, and a number of interesting speakers.

This week is NCSL in Seattle where we will be discussing our new editor with potential customers and partners. Mark Stodder from Xcential will be in attendance.

In a month, I’ll be in Ravenna once more for the European LEX Summer School — where I’ll be able to show even more progress towards the goal of a full product line of Akoma Ntoso tools. It’s interesting times for me.

The editor is coming along nicely and we’re beginning to firm up our QuickStarter beta plans. I’ve already received a number of requests and will be getting in touch with everyone as soon as we’re ready to roll out the program. If you would like to participate as a beta tester — or if you would just like more information, please contact us at info@xcential.com.

I’m really excited about how far we’ve come. Akoma Ntoso is on the verge of being certified as an official OASIS standard, our Akoma Ntoso products are coming into place, and interest around the world is growing. I can’t wait to see where we will be this time next year.

Akoma Ntoso, HTML5, LegisPro Web, LEX Summer School, Standards, Track Changes, W3C

Coming soon!!! A new web-based editor for Akoma Ntoso

I’ve been working hard for a long time — building an all new web-based editor for Akoma Ntoso. We will be showing it for the first time at the upcoming Akoma Ntoso LEX Summer School in Washington D.C.

Unlike our earlier AKN/Editor, this editor is a pure XML editor designed from the ground up using the XML capabilities that modern browsers possess. This editor is much more robust, more precise, and is very scalable.

Basic Features

Configurable XML models — including Akoma Ntoso and USLM
Edit full documents or portions of large documents
Flexible selection and editing regardless of XML structure
Built-in redlining (change tracking) supporting textual AND structural changes
Browse document sources with drag-and-drop.
Full undo & redo
Customizable attribute editor
Search and replace
Modular architecture to allow for extensive customization

Underlying Technology

XML-based editing component
- DOM 4 support
- XPath Support
- CSS Styling
- Sophisticated event model
HTTP-based resolver architecture for retrieving documents
- Interpret citations
- Deference URLs
- WebDAV adaptors to document repositories
- Query repositories with XQuery or databases with SQL
AngularJS-based User Interface using HTML5
- Component modules for easy customization
XML repository for storing documents
- Integrate any XML repository
- Built-in support for eXist-db
Validation & Publishing
- XML Schema validator
- XSL-FO publishing

We’ll reveal a lot more at the LEX Summer School later this month! If you’re interested in our QuickStart beta program, drop me a note at grant.vergottini@xcential.com.

Akoma Ntoso, LegisPro Web, LEX Summer School, Standards, Track Changes

Akoma Ntoso (LegalDocML) is now available for public review

It’s been many years in the making, but the standardised version of Akoma Ntoso is now finally in public review. You can find the official announcement here. The public review started May 7th and will end on June 5th — which is quite a short time for something so complex.

I would like to encourage everything to take part in this review process, as short as it is. It’s important that we get good coverage from around the world to ensure that any use cases we missed get due consideration. Instructions for how to comment can be found here.

Akoma Ntoso is a complex standard and it has many parts. If you’re new to Akoma Ntoso, it will probably be quite overwhelmed. To try and cut through that complexity, I’m going to try and give a bit of an overview of what the documentation covers, and what to be looking for.

There are four primary documents

Akoma Ntoso Version 1.0 Part 1: XML Vocabulary — This document is the best place to start. It’s an overview of Akoma Ntoso and describes what all the pieces are and how they fit together.
Akoma Ntoso Version 1.0 Part 2: Specifications — This is the reference material. When you want to know something specific about an Akoma Ntoso XML element or attribute, this is the document to go to. In contains very detailed information derived from the schema itself. Also included with this is the XML schema (or DTD if you’re still inclined to use DTDs). and a good set of examples from around the world.
Akoma Ntoso Naming Convention Version 1.0. This document describes two very interrelated and important aspects of the proposed standard — how identifitiers are assigned to elements and how IRI-based (or URI-based) references are formed. There is a lot of complexity in this topic and it was the subject to numerous meetings and an interesting debate at the Coco Loco restaurant in Ravenna, Italy, one evening while being eaten by mosquitoes.
Akoma Ntoso Media Type Version 1.0 — This fourth document describes a proposed new media type that will be used when transmitting Akoma Ntoso documents.

This is a lot of information to read and digest in a very short amount of time. In my opinion, the best way to try and evaluate Akoma Ntoso’s applicability to your jurisdiction is as follows:

First, look at the basic set of tags used to define the document hierarchy. Is this set of tags adequate. Keep in mind that the terminology might not always perfectly align with your terminology. We had to find a neutral terminology that would allow us to define a super-set of the concepts found throughout the world.
If you do find that specific elements you need are missing, consider whether or not that concept is perhaps specific to your jurisdiction. If that is the case, take a look at the basic Akoma Ntoso building blocks that are provided. While we tried to provide a comprehensive set of elements and attributes, there are many situations which are simply too esoteric to justify the additional tag bloat in the basic standard. Can the building blocks be used to model those concepts?
Take a look at the identifiers and the referencing specification. These parts are intended to work together to allow you to identifier and access any provision in an Akoma Ntoso document. Are all your possible needs met with this? Implicit in this design is a resolver architecture — a component that parses IRI references (think of them as URLs) and maps to specific provisions. Is this approach workable?
Take a look at the basic metadata requirements. Akoma Ntoso has a sophisticated metadata methodology behind it and this involves quite a bit of indirection at times. Understand what the basic metadata needs are and how you would model your jurisdictions metadata using this.
Finally, if you have time, take a look at the more advanced aspects of Akoma Ntoso. Consider how information related to the documents lifecycle and workflow might be modeled within the metadata. Consider your change management needs and whether or not the change management capabilities of Akoma Ntoso could be adapted to fit. If you work with complex composite documents, take a look at the mechanisms Akoma Ntoso provides to assemble composite documents.

Yes, there is a lot to digest in just a few weeks. Please provide whatever feedback you can.

We’re also now in the planning stages for a US LEX Summer School. If you’ve followed my blog over the years, you’ll know that I am a huge fan of the LEX Summer School in Ravenna, Italy — I’ve been every year for the past five years. This year, Kirsten Gullikson and I convinced Monica and Fabio to bring the Summer School to Washington D.C. as well. The summer school will be held the last week of July 2015 at George Mason University. The class size will be limited to just 30, so be sure to register early once registration opens. If you want to hear me rattle on at length about this subject, this is the place to go — I’ll be one of the teachers. The Summer School will conclude with a one day Akoma Ntoso Conference on the Saturday. We’ll be looking for papers. I’ll send out a blog with additional information as soon as it’s finalized.

You may have noticed that I’ve been blogging a lot less lately. Well, that’s because I’ve been heads down for quite some time. We’ll soon be in a position to announce our first full Akoma Ntoso product. It’s an all new web-based XML editor that builds on our experiences with the HTML5 based AKN/Editor (LegisPro Web) that we built before.

This editor is composed of four main parts.

First, there is a full XML editing component that works with pure XML — allowing it to be quite scalable and very XML precise. It implements complex track changes capabilities along with full redo/undo. I’m quite thrilled how it has turned out. I’ve battled for years with XMetaL’s limitations and this was my opportunity to properly engineer a modern XML editor.
Second, there is a sophisticated resolver technology which acts as the middleware, implementing the URI scheme I mentioned earlier — and interfacing with local and remote document resources. All local document resources are managed within an eXist-db repository.
Third, there is the Akoma Ntoso model. The XML editing component is quite schema/model independent. This allows it to be used with a wide variety of structured documents. The Akoma Ntoso model adapts the editor for use with Akoma Ntoso documents.
And finally, there is a very componentised application which ties all the pieces together. This application is written as an AngularJS-based single page application (SPA). In an upcoming blog I’ll detail the trials and tribulations of learning AngularJS. While learning AngularJS has left me thinking I’m quite stupid at times, the goal has been to build an application that can easily be extended to fit a wide variety of structured editing needs. It’s important that all the pieces be defined as modules that can either be swapped out for bespoke implementations or complemented with additional capabilities.

Our current aim is to have the beta version of this new editor available in time for the Summer School and Akoma Ntoso conference — so I’ll be very heads down through most of the summer.

Akoma Ntoso, Standards

And now for something completely different… Chinese!

Last week we saw how Akoma Ntoso can be applied to a very large consolidated Code – the United States Code. This week we take the challenge in a different direction – applying Akoma Ntoso to a bilingual implementation involving a totally different writing system. Our test document this week is the Hong Kong Basic Law. This document serves as the constitutional document of the Hong Kong Special Administrative Region of the People’s Republic of China. It was adopted on the 4 April 1990 and went into effect on July 1, 1997 when the United Kingdom handed over the region to the People’s Republic of China.

The Hong Kong Basic Law is available in English, Traditional Chinese, and Simplified Chinese. For our exercise, we are demonstrating the document in English and in Traditional Chinese. (Thank you to Patrick for doing the conversion for me.) Fortunately, using modern technologies, supporting Chinese characters alongside Latin characters is quite straightforward. Unicode provides a Hong Kong supplementary character set to handle characters unique to Hong Kong. The biggest challenge is ensuring that all the unicode declarations throughout the various XML and HTML files that the information must flow through are set correctly. With the number of accents we find in names in California as well as the rigorous nature of California’s publishing rules, getting Unicode right is something we have grown accustomed to.

While I hadn’t expected there to be any problems with Unicode, I was pleasently surprised to find that the fonts used in Legix simply worked with the Traditional Chinese characters without issue as well. (Well at least as far as I can tell without the ability to actually read Chinese)

The only issue we encountered was Internet Explorer’s support for CSS3. Apparently, IE still does not recognize “list-style-type” with a value of “cjk-ideographic”. So instead of getting Traditional Chinese numerals, we get Arabic numerals. The other browsers handled this much better.

So what other considerations were there? A big consideration was the referencing mechanism. To me, modeling how you refer to something in an information model can be more important than the information model itself. The referencing mechanism defines how the information is organized and allows you to address a specific piece of information in a very precise and accurate way. Done right, any piece of information can be accessed very quickly and easily. Done wrong and you get chaos.

Our referencing mechanism relies on the Functional Requirements for Bibliographical Records (FRBR). This mechanism is used by both SLIM and Akomantoso. Another interesting FRBR proposal for legislation can be found here.

FRBR defines an information model based on a hierarchical scheme of Work-Expression-Manifestion-Item. Think of the work as the overall document being addressed, the expression being the version desired, the manifestation the format you want to information presented in, and finally the item as a means for addressing a specific instance of the information. Typically we’re only concerend with Work-Expression-Manifestation.

For a bilingual or multilingual system, the “expression” part of the reference is used to specify which language you wish the document to be returned in. If you check out the references at Legix.info you will see that the two references the the Hong Kong Basic Law are:

The expressions are called out as “doc;en-uk” for the English version and “doc;zh-yue” for the Chinese version. Relatively straightforward. The manifestations are not shown and the result is the default manifestation of HTML.

Check the samples out and let me know what you think.

Legis Info

Applying Legislative Technologies

Tag Archives: XML Schema