Akoma Ntoso, Standards, W3C

Automating Legal References in Legislation

This is a blog I have wanted to write for quite some time. It addresses what I believe to be the single most important issue when modeling information for legal informatics. It is also, I believe, the most urgent aspect that we need to agree upon in order to promote legal informatics as a real emerging industry. Today, most jurisdictions are simply cobbling together short term solutions without much consideration to the big picture. With something this important, we need to look at the big picture first and come up with a lasting solution.

Citations, references, or links are a very important aspect of the law. Laws are inherently a web of interconnections and interdependencies. Correctly resolving those connections allows us to correctly interpret the law. Mistakes or ambiguities in how those connections are made is completely unacceptable.

I work on projects around the world as well as my work on the OASIS LegalDocumentML technical committee. As I travel to the four corners of the Earth, I am starting to see more clearly how this problem can be solved in a clean and extensible manner.

There are, of course, already many proposals to address this. The two I have looked at the most are both from Italy:
A Uniform Resource Name (URN) Namespace for Sources of Law (LEX)
Akoma Ntoso References (in the process of being standardized by OASIS)

My thoughts derive from these two approaches, both of which I have implemented in one way or another, with varying degrees of success. My earliest ideas were quite similar to the LEX-URN proposal by being based around URNs. However, with time Fabio Vitali at the University of Bologna has convinced me that the approach he and Monica Palmirani put forth with Akoma Ntoso using URLs is more practical. While URNs have their appeal, they really have not achieved critical mass in terms of adoption to be practical. Also, the general reaction I have gotten with LEX-URN encoded references has not been positive. There is just too much special encoding going on within them for them to be readable by the uninitiated.

Requirements

Before diving into this subject too deep, let’s define some basic requirements. In order to be effective, a reference must:
• Be unambiguous.
• Be predictable.
• Be adaptable to all jurisdictions, legal systems, and all the quirks that arise.
• Be universal in application and reach.
• Be implementable with current tools and technologies.
• Be long lasting and not tied to any specific implementation
• Be understandable to mere mortals like myself.

URI/IRI

URIs (Uniform Resource Identifiers) give us a way to identify resources in a computing system. We’re all familiar with URLs that allow us to retrieve pages across the web using hierarchical locations. Less well known are URNs which allow us to identify resources using a structured name which presumably will then be located using some form of a service to map the name to a location. The problem is, a well-established locating service has never come about. As a result, URNs have languished as an idea more than a tool. Both URLs and URNs are forms of URIs.

IRIs are a generalization of URIs to allow characters outside of the ASCII character set supported by normal URIs. This is important in jurisdictions that use more complex character than ASCII supports.

Given the current state of the art in software technology, basing references on URIs/IRIs makes a lot of sense. Using the URL/IRL variant is the safer and more universally accepted approach.

FRBR

FRBR is the Functional Requirements for Bibliographical Records. It is a conceptual entity-relationship model developed by librarians for modeling bibliographic information in databases. In recent years it has received a fair amount of attention for use as the basis for legal references. In fact, both the LEX-URN and the Akoma Ntoso models are based, somewhat loosely, on the model. At times, there is some controversy as to whether this model is appropriate or not. My intent is not to debate the merits of FRBR. Instead, I simply want to acknowledge that it provides a good overall model for thinking about how a legal reference should be constructed. In FRBR, there are four main entities:
1. Work – The work is the “what”, allowing us to specify what it is that we are referring to, independent of which version or format we are interested in.
2. Expression – The expression answers the “from when” question, allowing us to specify, in some manner, which version, variant, or time frame we are interested in.
3. Manifestation – The manifestation is the “which format” part, where we specify the format that we would like the information returned as.
4. Item – The item finally allows us to specify the “from where” part, when multiple sources of the information are available, that we want the information to come from.

That’s all I want to mention about FRBR. I want to pick up the four concepts and work from them.

What do we want?

Picking up the Akoma Ntoso model for specifying a reference as a URL, and mindful of our basic requirements, a useful model to reference a resource is as a hierarchical URL, starting by specifying the jurisdiction and then working hierarchically down to the item in question.

This brings me to the biggest hurdle I have come across when working with the existing proposals. It’s not terribly clear what a reference should be like when the item being referenced is a sub-part of a resource being modeled as an XML document. For instance, how would I refer to section 500 of the California Government Code? Without putting in too much thought, the answer might be something like /us-ca/codes/gov.xml#sec500, using a URL to identify the Government Code followed by a fragment identifier specifying section 500 of the Government Code. The LEX URN proposal actually suggests using the # fragment identifier, referring to the fragment as a partition. There are two problems with this solution though. First, any browser will interpret a reference using the fragment identifier as two parts – the part before the # fragment identifier showing the resource to be retrieved from the server and the part after the fragment identifier as an “id” to the item to scroll to. Retrieving the huge Government code when all we want is the one sentence in Section 500 is a terrible solution. The second problem is that it defines, possibly for all time, how a large document might have been constructed out of sub-documents. For example, is the US Code one very large document, does it consist of documents made out of the Titles, or as it is quite often modeled, is every section a different document? It would be better if references did not capture any part of this implementation decision. A better approach is to allow the “what” part of a reference to be specified as a virtual URL all the way down to whatever is wanted, even when the “what” is found deep inside an XML document in a current implementation. For example, the reference would better be specified as /us-ca/codes/gov/sec500. We’re not exposing in the reference where the document boundaries currently exist.

On to the next issue, what happens when there is more than one possible way to reference the same item? For example, the sections in California’s codes, as is usually the case, are numbered sequentially with little regard to the heading hierarchy above the sections. So a reference specified as /us-ca/codes/gov/sec500 is clear, concise, and unambiguous. It follows the manner in which sections are cited in the text. But /us-ca/codes/gov/title1/div3/chap6/sec500 is simply another way to identify the exact same section. This happens in other places too. /us-ca/statutes/2012/chap5 is the same document as /us-ca/bills/2011/sb730. So two paths identify the same document. Do we allow two identities? Do we declare one as the canonical reference and the other as an alternate? It’s not clear to me.

What about ambiguity? Mistakes happen and odd situations arise. Take a look at both Chapter 14s that exist in Division 6 of Title 1 of the California Government Code. There are many reasons why this happens. Sometimes it’s just a mistake and sometimes it’s quite deliberate. We have to be able to support this. In California, we disambiguate by using “qualifying language” which we embed somehow into the reference. The qualifying language specifies the last statute to create or amend the item needing disambiguation.

The From When do we want it?

A hierarchical path identifies, with some disambiguation, what it is we want. But chances are that what we want has varied over time. We need a way to specify the version we’re looking for or ask for the version that was valid at a specific point in time. Both the LEX URN and the Akoma Ntoso proposals for references suggest using an “@” sign around some nomenclature which identifies a version or date. (The Akoma Ntoso proposal adds the “:” sign as well)

A problem does arise with this approach though. Sometimes we find that multiple versions exist at a particular date. These versions are all in effect, but based on some conditional logic, only one might be operational at a particular time. How one deals with operational logic can be a bit tricky at times. That’s an open issue to me still.

Which Format do we want?

I find specifying the format to be relatively uncontroversial. The question is whether we specify the format using well established prefixes such as .pdf, .odt, .docx, .xml, and .html or whether we instead try to be more precise by embedding or encoding the MIME type into the reference. Personally, I think that simple extensions, while less rigorous and subject to unfortunate variations and overlaps, offer a far more likely to be adopted approach than trying to use the MIME type somehow. Simple generally wins over rigorous but more complex solutions.

The From Where should it come?

This last part, the from where should it come part, is something that is often omitted from the discussion. However, in a world where multiple libraries offering the same resource will quite likely exist, this is really important. Let’s take a look at the primary example once more. We want section 500 of the California Government Code. The reference is encoded as /us-ca/codes/gov/sec500. Where is this information to come from? Without a domain specified, our URL is a local URL so the presumption is that it will be locally resolved – the local system will find it, somehow. What if we don’t want to rely on a local resolution function? What if there are numerous sources of this data and we want to refer to one of them in particular. When we prepend the domain, aren’t we specifying from where we want the information to come from? So if we say http: //leginfo.ca.gov/us-ca/codes/gov/sec500, aren’t we now very precisely specifying the source of the information to be the official California source? Now, say the US Library of Congress decides to extend Thomas to offer state legislation. If we want to specify that copy, we would simply construct a reference as http: //thomas.loc.gov/us-ca/codes/gov/sec500. It’s the same URL after the domain is specified. If we leave the URL as simply /us-ca/codes/gov/sec500, we have a general reference and we leave it to the local system to provide the resolution service for retrieving and formating the information. We probably want to save references in a general fashion without a domain, but we certainly will need to refer to specific copies within the tools that we build.

Resolvers

The key to making this all work is having resolvers that can interpret standardized references and find a way to provide the correct response. It is important to realize that these URLS are all virtual URLs. They do not necessarily resolve to files that exist. It is the job of the resolving service to either construct the valid response, possibly by digging into database and files, or to negotiate with other resolvers that might do all or part of the job of providing a response. For example, imagine that Cornell University offers a resolver at http: //lii.cornell.edu. It might, behind the scenes, work with the official data source at http: //leginfo.ca.gov to source California legislation. Anyone around the world could use the Cornell resolver and be unaware of the work it is doing to source information from resolvers at the official sources around the world. So the local system would be pointed to the Cornell service and when the reference /us-ca/codes/gov/sec500 arose, the local system would defer to the LII service for resolution which in turn would defer to California’s official resolver. In this way, the resolvers would bear the burden of knowing where all the official data sources around the world are located.

Examples

So to end, I would like to sum up with some examples:

[Note that the links are proposals, using a modified and simplified form of the Akoma Ntoso proposal, rather than working links at this point]

/us-ca/codes/gov/sec500
– Get section 500 of the California Government Code. It’s up to the local service to decide where and how to resolve the reference.

http: //leginfo.ca.gov/us-ca/codes/gov/sec500
– Get Section 500 of the California Government Code from the official source in California.

http: //lii.cornell.edi/us-ca/codes/gov/sec500
– Get Section 500 of the California Government Code from Cornell’s LII and have them figure where to get the data from

/us-ca/codes/gov/sec500@2012-01-01
– Get Section 500 of the California Government Code as it existed on January 1, 2012

/us-ca/codes/gov/sec500@2012-01-01.pdf
– Get Section 500 of the California Government Code as it existed on January 1, 2012, in a PDF format

/us-ca/codes/gov/title1/div3/chap6/sec500
– Get Section 500 of the California Government Code, but the fully hierarchy is specified

My blog has gotten very long and I have only just started to scratch the surface. I haven’t addressed multilingual issues, alternate character sets, and a host of other issues at all. It should already be apparent that this is all simply a natural extension of the URLs we already use, but with sophisticated services underneath resolving to items other than simple files. Imagine for a moment how the field of legal informatics could advance if we could all agree to something this simple and comprehensive soon.

What do you think? Are there any other proposals, solutions, or prototypes out there that addresses this? How does the OASIS legal document ML work factor into this?

Standard
Standards

Legal Informatics Glossary of Terms

I work with people from around the world on matters relating to legal informatics. One common issue we constantly face is the issue of terminology. We use many of the same terms, but the subtly of their definitions end up causing no end of confusion. To try and address this problem, I’ve proposed a number of times that we band together to define a common vocabulary, and when we can’t arrive at that, at least we can understand the differences that exist amongst us.

To get the ball rolling, I have started a wiki on GitHub and populated it with many of the terms I use in my various roles. Their definitions are a work-in-progress at this point. I am refining them as I find the time. However, rather than trying to build my own private vocabulary, I would like this to be a collaborative effort. To that end, I am inviting anyone with an interest in this to help build out the vocabulary by adding your own terms with definitions to the list and improving the ones I have started.

My legal informatics glossary of terms can be found in my public legal Informatics project at:

https://github.com/grantcv1/Legal-Informatics/wiki/Glossary

The wiki is a public project on GitHub. Right now, anyone can contribute. We’ll see how well this model works out. In order to contribute, you need to sign up for a free GitHub account and to master the basics of GitHub. For the purposes of managing a vocabulary, it’s quite simple. You will need to understand the markdown format of the text file that is behind the list. The builtin editor in GitHub makes editing the markdown quite simple. If you are so inclined, you can learn more about markdown at http://daringfireball.net/projects/markdown/syntax. GitHub will take care of all the versioning issues so feel free to edit the terminology file.

Eventually I would like to gather enough terms that common terms or clusters of terms can be identified. This will allow us to develop clearer and more understandable standards, tools, and documentation in the emerging areas of legal informatics.

Standard
Akoma Ntoso, Standards

“A Bill is a Bill is a Bill!”

I remember overhearing someone exclaim “A Bill is a Bill is a Bill” many years ago. What she meant is that a bill should be treated as a bill in the same way regardless of the system it flows through.

I’m going to borrow her exclamation to refer to something a bit different by rephrasing it slightly. Is a bill a Bill and a akn:bill? Lately I’ve been working in a number of different jurisdictions, and through my participation on OASIS and the LEX Summer School, with many other people from even more jurisdictions. It sometimes seems that everything bogs down when it comes to terminology. In the story of Babel, God is said to have divided the world into different languages to thwart the people’s plans to build the tower of Babel. Our problem isn’t that we’re all speaking different languages. Rather it’s that we all think we’re speaking the same language but our definitions are all different.

The way the human brain learns means that we learn incrementally. I remember someone once telling me that you can only know that which you almost knew before. When we try and learn a new thing, we cling to the parts we think we understand and our brain filters out the things we don’t. Ever learn a new word and then realize that its being used all around you every day – but you seemingly had never heard it before? So what happens when we recognize terms that are familiar, but fail to notice the details that would tell us that the usage isn’t quite what we expect? We misinterpret what we learn and then we make mistakes. Usually, through a slow process of trial and error, we learn from our mistakes and eventually we master the new subject. This takes time, limiting our effectiveness until our competency rises to an adequate level.

Let’s get back to the notion of a bill. Unfortunately in legislation, there is often a precise definition and an imprecise definition for a term and we don’t always know which is being used. What’s worse, if we’re not indoctrinated in the terminology, we might never realize that there is ambiguity in what we are hearing. For instance, I know of three different definitions for the word “bill”:

  1. The first usage is a very precise definition and it describes a document, introduced in a legislative body, that proposes to modify existing law. I will call this document a Bill (with a capital B). In California, all legislative documents which modify law are Bills. Subdivision (b) of Section 8 or Article 4 of the California Constitution defines it this way. At the federal level, the same definition of a bill exists, except that another document, the Joint Resolution, has similar powers to enact law, but this document is not a Bill.
  2. The second definition is the much looser definition that applies the word to any official documents acted upon by a legislature. I will use the term bill (with a lower-case b) when referring to this usage. In California, the precise term that is synonymous with bill is Measure. At the federal level, the precise term used is either Measure or Resolution. Of course, this opens up more confusion. In California, Measures are either Bills (and will affect law when enacted) or are Resolutions which express a statement from the legislature or house without directly affecting the law. So now the Federal Resolution is a synonym for a Measure while the California Resolution is a subclass of it. The Federal equivalent of a California Resolution is a Non-binding Resolution.
  3. The third definition is the Akoma Ntoso definition of a bill which I will refer to as the akn:bill. At first glance, it appears to equate with the precise definition of a Bill. It is defined as a document that affects the law upon enactment. But this definition breaks down. The akn:bill applies more broadly than the precise definition of a Bill but not as broadly as the imprecise definition of a bill. So an akn:bill applies to the federal Joint Resolution, Constitutional Amendments, California Initiatives along with the precise notion of a Bill.

I can summarize all this by saying that all Bills are akn:bills, and all akn:bills are bills, but not all bills are akn:bills, and not all akn:bills are Bills.

As if this isn’t confusing enough, other terms are even more overloaded. As I already alluded to, the term resolution is quite overloaded and the term amendment is even worse. Even the distinction between a bill and an act is unclear. Apparently a Bill, when it has passed one house, technically becomes an Act even before it passes the opposite house, but the imprecise term bill generally continues to be used.

To try and untangle all this confusion and allow us to communicate with one another more effectively, I have started a spreadsheet to collect all the terms and definitions I come across during my journey through this subject. My goal is to try and find the root concepts that are hidden underneath the vague and/or overloaded terminology we use and hopefully find some neutral terms which can be used to disambiguate communication between people coming from different legislative traditions. The beginning of my effort can be found at:

https://docs.google.com/spreadsheet/ccc?key=0ApeIHP2TOckZdHRPZ1pIeVRUcVpJZzZQT1BCSHFqd0E

Please feel free to contribute. Send me any deviations and additions you may have. And if you note an error in my understanding, please let me know. I am still learning this myself.

Standard
Akoma Ntoso, Hackathon, HTML5, Standards, W3C

Update on our Web-based Legislative Editor

It’s been a while since my last blog post. I’ve been busy working on a number of activities. As a result, I have a lot of news to announce regarding the web-based editor, previously known as the AKN/Editor, that we originally built for the “Unhackathon” back in May.

As you might already have guessed, it has a new name. The new name is “LegisPro Web” which now more clearly identifies its role and relationship to Xcential’s XMetaL-based standalone editor “LegisPro”. Going forward, we will be migrating much of the functionality currently available in LegisPro into LegisPro Web.

Of course, there is now a new web address for the editor – http://legisproweb.com. As before, the editor prototype is freely available for you to explore at this address.

As I write this blog this early Sunday morning, I am in Ravenna, Italy where I just participated in the LEX Summer School 2012 put on by the University of Bologna. On Monday, the Akoma Ntoso Developer’s Workshop starts at the same venue. In addition to listening to the other developers present their work, I will be spending an afternoon presenting all the ins and outs of the LegisPro Web editor. I’m excited to have the opportunity to learn about the other developer’s experiences with Akoma Ntoso and to share my own experiences building a web-based XML editor.

Last month we demonstrated the LegisPro Web editor at the National Conference of State Legislators’s (NCSL) annual summit in Chicago this year. It was quite well received. I remain surprised at how much interest there is in an editor that is targetted to a tagging role rather than an editing role.

Of course, there has been a lot of development of the editor going on behind the scenes. I have been able to substantially improve the overall stability of the editor, its compliance with Akoma Ntoso, as well as add significant new functionality. As I become more and more comfortable and experienced with the new HTML5 APIs, I am starting to build up a good knowledge base of how best to take advantage of these exciting new capabilities. Particularly challenging for me has been learning how to intuitively work with the range selection mechanism. The origins of this mechanism are loosely related to the similar mechanism that is available within XMetaL. While I have used XMetaL’s ranges for the past decade, the HTML5 mechanisms are somewhat more sophisticated. This makes them correspondingly harder to master.

And perhaps the most exciting news of all is that the editor now has some customers. I’m not quite ready to announce who they are, but they do include a major government entity in a foreign country. As a result of this win, we will be further expanding our support of Akoma Ntoso to support Debate and Debate Reports in addition to the Bill and Act documents types we currently support. In addition, we will be adding substantial new capabilities which are desired by our new customers. I should also mention that Ari Hershowitz (@arihersh) has joined our team and will be taking the lead in delivering the customized solution to one of our customers.

Alongside all this development, work continues at the OASIS LegalDocumentML Technical Committee. Look to see me add support for the Amendment document type to the editor in support of this activity in the not- too-distant future.

All in all, I think we’re making terrific progress bringing the LegisPro Web editor to life. I started work on the editor as a simple idea a little more than six months ago. We used the “Unhackathon” in May to bootstrap it to life. Since then, it’s taken off all on its own and promises to become a major part of our plans to build a legitimate legal informatics industry around an Akoma Ntoso based standard.

Standard
Akoma Ntoso, Standards

Legislative Legos

I just got back from a vacation in Denmark – the land where Legos come from. I thought it might be appropriate to spend some time talking about Flavio Zeni at UNDESA calls Legislative Legos – building blocks that can be used to build legislative systems.

Why is this such an important concept? Well, it’s all about managing risk and building a system that can adapt to change in timely manner. Legislative systems can take years and cost many millions of dollars to develop. As the sophistication grows and automation takes root throughout the organization, these systems become extensive mission critical parts of the organization’s fabric. So failure is not an option.

At the same time as legislative systems become so ingrained, technology cycles are shrinking. Rapid changes in technology ensures that the expected lifecycle of any part of a system will be compressed. Wasting 5 years of a technology cycle to perfect a large system chews up a lot of the viable life of a technology as well. Also, as change occurs, the technology waves are being blurred into streams of constant change. So catching a technology wave at the exact right moment becomes impossible.

Imagine for a moment choosing to build a brand new, built from scratch, system. You’re looking at a multi-million dollar proposition that will take at least 3-5 years to come to fruition. The technologies you choose will all be at various stages of maturity. Some technologies might be well established and stable – but they might also be waning and likely to soon be obsolete. There the risk is building a system now that will be obsolete by the time it is deployed. On the other hand, other technologies will be on the bleeding edge. Choosing to use them might maximize the lifecycle of the result, but it does so at the risk of using a technology that might not yet be ready for prime time. When building a large scale system, the likelihood of all the technologies being optimal for adoption at the same time you need them is very remote.

Think about the infrastructure of a city. The road system has all sorts of problems. Traffic jams, potholes, and aging infrastructure abound. Under the streets are a rat’s nest of utilities from different eras. It might be very tempting to want to simply demolish the city and build a new one. But that’s a preposterous idea. Not only would it not be affordable, the inconveniance and risk would be unthinkable. This is something that becomes true of all systems as them become large and ingrained. They are no longer replaceable. Instead they must evolve constantly – in as smooth and efficient a manner as can be achieved.

So how must legislative systems evolve? The must evolve by being built as interoperable modules. These modules must stand on their own and be able to be deployed, updated, and replaced on their own timetable. These modules must be defined in such a way as to maximize the likelihood that expertise gained elsewhere can be harnessed to reduce risk and lower costs. These modules must be sufficiently independent of the rest of the system that should they fail or become obsolete they do not jeopardize everything else. This is all simply good risk management.

So let’s think about Legos for a moment. When I was a kid they were my favorite toy and I was the champ at the Lego competitions in the small town where I grew up. I would spend hours and hours building all sorts of different things out of them. A consistent standard for how the blocks connect is what made this possible. That standard doesn’t dictate how the blocks go together or what you can build with them. Instead, Lego defines a simple protocol that allows the blocks to connect and then they provide enough variety in their blocks to allow your imagination to run wild.

We need the same concept for legislative systems. Instead of large monolithic, likely to either fail or be obsolete, but always be expensive solutions, we need a simple set of protocols that will allow modules to be built and then mixed or matched to solve all the varying requirements that exist for legislative systems.

So what is the legislative analogue of the Lego standard? First we need the basic protocol for how things connect. That is where XML and something like Akoma Ntoso comes in. It defines the most basic parameters of how the information can be connected. Next we need to define what the basic pieces are. We need an editor, a repository, some sort of metadata management, various publishing engines, and so on. Those are the blocks. Not every system will need the same blocks and not every block will fit every application. With enough forethought and enough industry cooperation, blocks can be made to fit together in a variety of ways to solve every need.

I come from the electronic CAD industry. Our early generation systems in the 1980s were large monolithic solutions. As a vendor, we loved the idea because we locked our customers into our solution. Their investment in our system become so large that replacing us was inconceivable. All we had to do was have our professional services people show up at a prospect, convince them to choose our product suite, and then accept all the custom code we would build to cement the relationship forever. This worked great until our product became obsolete. Not only could our customers not get out from under the monstrosity we had built, neither could we. Smaller companies with new innovative ideas started chipping away at our fortress (that’s what we called it) and we couldn’t adapt. When we tried, the result was more pieces stuck to our monolith which only made the problem worse. Monolothic system thinking nearly killed the business and made a lot of customers very angry. Today that same industry is made up of a myriad of much smaller less tightly coupled products, available from a large selection of venders, and bound by common industry standards. The Lego inspired model made the industry much stronger and served the customers far better. The design efficiency that results is what makes thing like smartphones and tablets, updated every few months, even possible.

The same thing must happen for legislative systems. Modular Legislative Legos will allow this field to flourish – to everyone’s benefit. So let’s work together to make this happen.

Standard
HTML5, Standards, W3C

Why Not Build a Legislative Editor out of Google Docs?

Ever since I started working on my legislative editor (http://legalhacks.org/editor), I’ve been asked over and over if I was using Google Docs, and if not, why not.

So to answer the first part of the question, the answer is a simple no. I don’t use Google Docs or anything like it.

There are a number of good reasons why I take a different path. The first reason is that Google simply doesn’t open Google Docs to that type of customization. Even if the customization capability was there, there are still plenty of reasons why choosing to build a legislative editor around Google Docs would not necessarily make sense. We can start with the privacy and security concerns with storing legislation in Google’s cloud. Let’s set aside that level of concern to focus more on the technical issues of the editor itself.

We’ll start by considering the difference between a word processor and an XML editor. When done right, an XML editor should superficially look just like a word processor. With a lot of effort, an XML editor can also be made to feel like a word processor. When I was implementing XMetaL for the California Legislature, the goal was very much to achieve the look and feel of a word processor. That is possible, but only to an extent.

There is a big difference between a word processor and an XML editor. Just because modern word processors now save their data in an XML format, they are not XML editors as a result. If you take a look at their file formats, OOXML for Microsoft Word or ODF for Open Office, you’ll see very complex and very fixed schemas that are far more oriented around presentation than one typically desires in an XML document. In fact, we try to separate presentation from structure in XML documents while OOXML and ODF blend them together. That’s what is at the heart of the difference between a word processor and an XML editor. In a word processor you worry about things like page breaks, margins, fonts, and various other attributes of the document that make it “pretty”. In an XML document, typically all of that is going to be added after the fact using a “formula” rather than being customized into each document. So while what you see in an XML document might look WYSIWYG, it’s actually more like “Somewhat WYSIWYG” and your ability to customize the formatting is quite constrained. This approach focuses you on the content and structure of the document and allows the resulting information to be targetted to many different form factors for publication. By not dictating the formatting of the document, the publication engine is more free to choose the best layout for any particular publication form – be it web, paper, mobile, or whatever.

When I explain this, the next question is whether or not my implementation is similar to how Google Docs is implemented. The answer is, again, a simple no. Google Docs’ implementation is dramatically different from the approach that I take. My approach relies on the new APIs being added to the browsers, in a consistent standardized way under the HTML5 umbrella, that allow the text content to be selected, edited, dragged about, and formatted. While these capabilities existed in earlier browsers to some extent, the manner in which they were supported was in no ways consistent. This made supporting multiple browsers really difficult and the resulting application would be a patchwork of workarounds and quirks. Even then, the browser variations would result in inconsistent behaviors between the browsers and the support and maintenance task would be a nightmare. It is for this reason, along with a need for page-oriented layout features, that Google abandoned the approach I am taking – though at a time when the standards that help ensure consistency were lacking.

So how does Google Docs do it then? How come they didn’t get bogged down in a sea of browser incompatibilities amongst all the legacy browsers they support? They do it by completely creating their own editing surface in JavaScript – something codenamed “kix”. Rather than relying on the browser for very much, they instead have their own JavaScript implementation of every feature they need for the editing surface. That is how they are able to implement rulers, pages, and drag boxes in ways you’ve just never seen in a browser before. It’s an amazing accomplishment and it allows them to support a wide range of browsers with a single implementation. That’s what you can do when you’re Google and have deep pockets. It’s a very expensive and very complex solution to the problem I am attempting to solve with modern standards. So while I can reasonably support the future alone with no baggage from the past, they’re able to support future and past browsers by skipping on the baggage but instead having their own custom implementation of everything. While I’m amazed at Google Doc’s accomplishment, attempting a similar thing with an XML editor would be cost and time prohibitive. Keep in mind that a lot of the capabilities of Google Docs’ editing surface deal with presentation aspects of a document, something that is of less concern to the typical XML document.

I’ve been working in XML editors for over 10 years now. Over the years, I’ve spent a lot of time wondering how one might implement a real web-based XML editor. Every time that my thoughts went beyond wondering and moved towards considering, I quickly discovered that all the resulting limitations of divergent implementations of base technology would make the project impractical. That’s partly what drove Google to spend the big bucks on a custom editing surface for their word processor. Now however, HTML5 is beginning to make a web-based XML editor a practical reality. Don’t mistake me, it’s still very difficult. Figuring out how to keep and XML document and an HTML5 view synchronized in not a simple task. While the browsers have all come a long way, they all still have their own weaknesses. Drag and drop has been broken in Safari since 5.1.2. Opera’s selection mechanism breaks when you toggle the contentEditable attribute right now. But these are problems that will disappear with time. As the standards are implemented and the bugs are fixed, I can already see how much HTML5 is going to change the application landscape. I would think long and hard about returning to traditional application development given what I now know about HTML5.

Standard
Akoma Ntoso, Standards

Why is Building a Legislative Editor so difficult?

Way back in 2002, when I first started building an editor for legislation, the problem looked so simple. We figured it was a quick six month project and we would then have a product we could resell to other jurisdictions. I remember when we first made our modest proposal, having it pushed back at us and being told we needed to go and rethink it. California had already learned, the hard way, that the deceptively simple problem of building a legislative editor was actually much more difficult and complex. What I have come to appreciate, over time, is that California’s approach actually makes the problem easier. It’s even harder elsewhere where the drafting rules are less structured and well thought out. I actually lucked out in starting with California, and it still took over three years to get a deployed solution in place – and that was only the beginning.

So what makes building a legislative editor so difficult? First of all, you need to understand what a bill really is. If you’re a software developer, the easiest way to understand it is to think of it as a document to “patch” existing law. Quite confusingly to software types, that process of patching the law results in something called “compiling codes”. So bills are patch documents containing intricate instructions for how to modify the law. What’s so hard about that? Well, for starters, the law has been built up over many decades or centuries and has accumulated all sorts of errors and odd situations that have to be accommodated. How do you correctly refer to something when, quite by accident, two different things have been numbered the same? How do you deal with the situation that arises when the law is defined to change in some complex temporal way? What happens when something is referred to by one number before a particular date, and then by a different number after that date? Quickly all these complexities start accumulating on top of your clean and simple architecture and the hope you’ll ever be able to solve it starts becoming a panic that perhaps you’ve attempted a problem that cannot be solved.

Let’s add another layer of complexity to the problem. The patch documents, which legislative types like to call bills, are also patched – or amended to use the real terminology. So you’ve got to deal with patches of patches. As a bill winds through the process of becoming law, it might be amended many times. Tracking all those amendments is going to be important to a lot of people. Keeping it all straight is crucial to producing an eventual result that patches the law accurately. Those legislative types aren’t very understanding when you get it wrong.

So now that you understand the complexity of the process, lets add some more complexity. Legislative traditions were defined long before there were computers. The procedures were defined for an era where everything was done using pen and paper. Amendments were made using scissors and glue. In fact, scissors and glue were still a part of the amendment process until quite recently in California. Other jurisdiction still have to resort to these procedures even today. Unfortunately, it is simply not possible to replace these outdated traditions with modern computer-based methodologies overnight. Legal traditions evolve on a deliberately conservative path. Good luck trying to throw out the old way just because the new way will be easier for the software. Chances are, you’re going to have to adapt more to the current process than the current process is going to adapt to you.

So what other aspects of legislative bill drafting are complex? I could go on and on about this subject. First of all, you’re not going to be able to convince the government to throw out all the existing laws and start again. You’re just going to have to deal with all the strange ways things were done in the past. If they numbered things with vulgar fractions (i.e. 1/2, 1/3) once-upon-a-time, you’re going to have to deal with that. If they believe that 100.15 reads as “one hundred point fifteen” and comes somewhere between “one hundred point ten” and “one hundred point twenty”, except when it is found between “one hundred point one and one hundred point two”, then you’re going to have to accept that too. Then there is the whole subject of redlining. Don’t be fooled into believing that redlining is the same as track changes in a word processor. It might look the same and be conceptually similar, but chances are there is a lot more hidden meaning in those strikes and inserts than you realize. When you get to the subject of making references or citations, you’ll discover just how much ambiguity there is in the subject. Quite often, a reference to a person is by last name alone. The only way to know which person with that last name is being referred to is to understand the context in terms of location and time. Even then, you might have to deal with ambiguity. When you get to the subject of tables, you’ll want to throw your arms in the air and walk away. Amending tables is an almost intractible problem. If you’re lucky, page and line numbers won’t come up at all. How on earth do you represent page and line numbers in a document that is explicitly designed to not worry about the physical aspects of the published form?

Okay, so you have found your way through all of that and now your editor is ready to go. Whew, that was hard! Oh, but now they want to be able to tear apart and rearrange the document in arbitary ways that make no sense with the XML hierarchy you’ve defined. Of course, you’re expected to be able to piece together the resulting mess to produce a valid document. Well they can’t do that. That’s just the way it is and you’ll tell them that when they tell you they’re not happy with the editor. Alright, that’s not a very good answer. Well, back to the drawing boards on that. Building an editor that both conforms to the software notions of how a document should be structured and how a non-technical users perceives it to be structured is a very difficult problem.

Legislative XML editors can be built. There are a few successful examples to show that it is possible. I’m proud to have muddled through to success with California, but it was very difficult. At the same time, there have been plenty of failures and projects that have gone way over budget. Having a standardized model, such as is promised by Akoma Ntoso and the standards effort within OASIS, should help by allowing this problem to be solved only a few times and then reused many times afterwards. However, given all the variations that exist, even with a standard in place this is going to be a difficult problem to solve for the foreseeable future.

Standard
Akoma Ntoso, HTML5, Standards, W3C

A Pluggable XML Editor

Ever since I announced my HTML5-based XML editor, I’ve been getting all sorts of requests for a variety of implementations. While the focus has been, and continues to be, providing an Akoma Ntoso based legislative editor, I’ve realized that the interest in a web-based XML editor extends well beyond Akoma Ntoso and even legislative editors.

So… with that in mind I’ve started making some serious architectural changes to the base editor. From the get-go, my intent had been for the editor to be “pluggable” although I hadn’t totally thought it through. By “pluggable” I mean capable of allowing different information models to be used. I’m actually taking the model a bit further to allow modules to be built that can provide optional functionality to the base editor. What this means is that if you have a different document information model, and it is capable of being round-tripped in some way with an editing view, then I can probably adapt it to the editor.

Let’s talk about the round-tripping problem for a moment. In the other XML editors I have worked with, the XML model has had to quite closely match the editing view that one works with. So you’re literally authoring the document using that information model. Think about HTML (or XHTML for an XML perspective). The arrangement of the tags pretty much exactly represents how you think of an deal with the components of the document.  Paragraphs, headings, tables, images, etc, are all pretty much laid out how you would author them. This is the ideal situation as it makes building the editor quite straight-forward.

However, this isn’t always the case. Depending on how much this isn’t the case determines how feasible building an editor is at all. Sometimes the issues are minor. For instance, in Akoma Ntoso, a section “num” element is out-of-line with the content block containing the paragraphs. So while it is quite common for the num to be inline in the first paragraph of the the section, that isn’t how Akoma Ntoso chooses to represent this. And it gets more difficult from there when you start dealing with subsections and sub-subsections.

To deal with these sorts of issues, a means of translating back and forth between what you’re editing and the information model you’re building is needed. I am using XSL Transforms, designed specifically for round-tripping to solve the problem. Not every XML model lends itself to document authoring, but by building a pluggable translating layer I’m able to adapt to more models than I have been able to in the past.

Along with these mechanisms I am also allowing for pluggable command structures, CSS styling rules, and, of course, the schema validation. In fact, the latest release of the editor at legalhacks.org has been refactored and now somewhat follows this pluggable architecture.

Next I plan to start working with modules like change tracking / redlining, metadata generation (including XBRL generation), and multilingual support following this pluggable architecture. I’m totally amazed at how much more capable HTML5 is turning out to be when compared to old-fashioned HTML4. I can finally build the XML editor I always wanted.

Standard
Akoma Ntoso, HTML5, Standards, Transparency

XBRL in Legislation

Over the past few weeks, my posts about the HTML5 editor I have been working on have received a lot of attention. One aspect that I have noticed throughout has been people equating legislative markup to XBRL. In fact, I have taken to explaining Akoma Ntoso as being like XBRL for legislation. That helps people better understand what we are up to.

Or does it? It’s a little misleading to describe a legislative information model as being like XBRL for legislation. The reason is that many of the people interested in the transparency aspects of legislative XML are interested in it precisely because they’re interested in tracking budget appropriations. The assumption being made is that the XML will somehow make it possible to track the money that flows through legislation.

In some respects XML does help track money. Certainly, reporting budgets as XML is a whole lot better than the other approach you often see – tables reported as images. Images are a very effective way to hide appropriations from machine processing. However, that’s less and less of a problem. Nowadays, if you take a look at any budget appropriation embedded within legislation, you’re likely to find the numbers reported in a table. Most likely that table will be in the form of an HTML table at that. How you interpret that table is up to you. Perhaps the CSS class names for each cell will provide some guidance as to each cell’s content, but the information is being reported in a manner intended for human consumption rather than machine processing. In short, the manner in which financial information is reported in legislation is not for the sake of improving the transparency of the data.

In California, when we designed the budget amending facilities within the bill drafting system, our objective was to find a way to draft a budget amendment with the limited tabular capabilities of XMetaL. Whether the result was transparent or not was not a consideration as it was not a requirement six years ago. Victory for us was finding a way to get the immediate job done. Elsewhere, I have yet to see any legislative information model attempt to address the issue of making budget appropriations more transparent. Rather, the focus is instead on things like the temporal aspects of amending the law, the issues that arise in a multilingual society, or ensuring the accuracy and authenticity of the documents.

So what is the solution? I must profess to know very little about XBRL. What I do know tells me that it is not a suitable replacement for all those HTML tables that we tend to use. XBRL is a complex data format normalized for database consumption. The information is not stored in a manner that would allow for easy authoring by a human author or easy reading by a human reader. I did find one article from three years back that begins to address the issue. Certainly we’ve come far on the subject of legislative XML and it’s time to reconsider this subject.

The good news is that we do have a solution for integrating XBRL with legislative XML. Within an Akoma Ntoso document is a proprietary section found inside the metadata block. This section is set aside specifically to allow foreign information models to be embedded within the legislative document. So, much as the metadata already contains analysis sections for recording the temporal aspects of the legislation, it is quite easy to add an XBRL section to record the financial aspects of the legislation.

So the next question is whether or not XBRL is designed to be embedded within another document. It would seem that the answer is yes – and it is called inline XBRL. While the specification addresses fragments of XBRL within HTML documents, I don’t see why this cannot be extended to a legislative information model.  Simply put, a fragment of inline XBRL data would be embedded within the metadata block of the legislative document recorded in XML. This data would be available for any XBRL processor to discover (how is another question) and consume. The inline XBRL would be produced prior to publication by analyzing the legislative XML’s tables used to draft the document.

Ideally, the XBRL data would be hooked directly to the legislative XML data, much like spreadsheet formalas can be attached to data, but maybe I’m getting ahead of myself. Providing budget appropriation information within inline XBRL embedded within the legislative XML would be a great step forward – and it would achieve the objectives that people that are interested in the transparency aspects of legislative XML actually have.

I’m certainly no expert in XBRL, so I’m interested in hearing what people in the know have to say about this. Let me know. If you know of an appropriations taxonomy for XBRL, let me know. And if you’re interested in following how the DATA Act might influence this subject, check out the Data Transparency Coalition.

Standard
Akoma Ntoso, Hackathon, HTML5, Standards, W3C

An HTML5-Based XML Editor for Legislation!

UPDATE: (May 17, 2012) For all the people that asked for more editing capabilities, I have updated the editor to provide rudimentary cut/copy/paste capabilities via the normal shortcut keys. More to follow as I get the cycles to refine the capabilities.

I’ve just released my mini-tutorial for the HTML5-based XML editor I am developing for drafting legislation (actually it’s best for tagging existing legislation at this point).


Please keep in mind that this editor is still very much in development – so its not fully functional and bug-free at this point. But I do believe in being open and sharing what I am working on. We will be using this editor at our upcoming International Legislation Unhackathons (http://legalhacks.org) this coming weekend. The editor is available to experiment with at legalhacks.org site.

There are three reason I think that this approach to building editors is important:

  1. The editor uses an open standard for storing legislative data. This is a huge development. The first couple generations of legislative systems were built upon totally proprietary data formats. That meant that all the data was locked into fully custom tools that were built years ago could only be understood by those systems. Those systems were very closed. That last decade was the development of the XML era of legislative tools. This made it possible to use off-the-shelf editors, repositories, and publishing tools. But the XML schemas that everyone used were still largely proprietary and that meant everyone still had to invest millions of dollars in semi-custom tools to produce a workable system. The cost and risk of this type of development still put the effort out of reach of many smaller legislative bodies.

    So now we’re moving to a new era, tools based on a common open standard. This makes it possible for an industry of plug-and-play tools to emerge, reducing the cost and risks for everyone. The editor I am showing uses Akoma Ntoso for its information model. While not yet a standard, it’s on a standards track at the OASIS Standards Consortium and has the best chance of emerging as the standard for legal documents.

  2. The editor is built upon open web standards. Today you have several choices when you build a legislative editor. First, you can build a full custom editor. That’s a bad idea in this day and age when there are so many existing editors to build upon. So that leaves you with the choice of building your editor atop a customizable XML editor or customizing the heck out of a word processor. Most XML editors are built with this type of customization in mind. They intrinsically understand XML and are very customizable. But they’re not the easiest tools to master – for either the developer or the end user. Another approach is to use a word processor and bend and distort it into being an XML editor. This is something well beyond the original intent of the word processor and dealing with the mismatch in mission statements for a word processor and a legislative drafting tool leaves open lots of room for issues in the resulting legislation.

    There is another problem as well with this approach. When you choose to customize an off-the-shelf application, you have to buy into the API that the tool vendor supplies. Chances are that API is proprietary and you have no guarantee that they won’t change it on a whim. So you end up with a large investment in software built on an application API that could become almost unrecognizable with the next major release. So while you hope that your investment should be good for 10-12 years, you might be in for a nasty surprise at a most inopportune time well before that.

    The editor I have built has taken a different approach. It is building upon W3C standards that are being built around HTML5. These APIs are standards, so they won’t change on a whim – they will be very long lived. If you don’t like a vendor and want to change, doing so is trivial. I’m not just saying this. The proof is in the pudding. This editor works on all four major browsers today! This isn’t just something I am planning to support in the future; it is something I already support. Even while the standards are still being refined, this editor already works with all the major browsers. (Opera is lagging behind in support for some of the application APIs I am using.) Can you do that with an application built on top of Microsoft Office? Do you want to switch to Open Office and have an application you built? You’re going to have to rewrite your application.

  3. Cloud-based computing is the future, Sure, this trend has been obvious for years, but the W3C finally recognizes the web-based application as being more than just a sophisticated website. That recognition is going to change computing forever. Whether your cloud is public or private, the future lies in web-based applications. Add to that the looming demands for more transparent government and open systems with facilitate real public participation and it becomes obvious that the era of the desktop application is over. The editor I am building anticipates this future.

  4. I’ve been giving a lot of thought to where this editor can go. As the standards mature, I learn to tame the APIs, and the browsers finish the work remaining for them, it seems that legislative drafting is only the tip of the iceberg for such an approach to XML-based editing. Other XML models such as DITA and XBRL might well be other areas worth exploring.

    What do you think? Let me know what ideas you have in this area.

Standard