Akoma Ntoso, LegisPro Web, Standards, Transparency, W3C

Free Akoma Ntoso Validator

How are people doing with the Library of Congress’ Akoma Ntoso Challenge? Hopefully, you’re making good progress, having fun doing it, and in so doing, learning a valuable new skill with this important emerging technology.

I decided to make it easy for someone without an XML Editor to validate their Akoma Ntoso documents for free. We all know how expensive XML Editors tend to be. If you’re like me, you’ve used up all the free trials you could get. I’ve separated the validation part of our LegisProweb editor from the editing base to allow it to be used as a standalone validator. Now, all you need to do is either provide a URL to your document or, even easier, drop the text into the text area provided and then click on the “Validate” button. You don’t even need to go find a copy of the Akoma Ntoso schema or figure out how to hook it up to your document – I do all that for you.

To use the validator, simply draft your Akoma Ntoso XML document, specifying the appropriate namespace using the @xmlns namespace declaration, and then paste a copy into the validator. I’ll go and find the schema and then validate your document for you. The validation results will be shown to you conveniently inline within your XML source to help you in making fixes. Don’t worry, we don’t record anything when you use the validator – it’s completely anonymous and we keep no record of your document.

You can validate either the 2.0 version of Akoma Ntoso or the latest 3.0 version which reflects the work of the OASIS LegalDocumentML committee. Actually, there are quite a few other formats that the validator also will work with innately and, by using xsi:schemaLocation, you can point to any XML schema you wish.

Give the free Akoma Ntoso XML Validator a try. You can access it here. Please send me any feedback you might have.

Validator1Input Form Validator2Validation Results
Standard
Akoma Ntoso, LegisPro Web, Standards, Transparency

Akoma Ntoso Challenge by the Library of Congress

As many of you may have already read, the U.S. Library of Congress has announced a data challenge using Akoma Ntoso. The challenge lasts for three months and offers a $5,000 prize to the winner.

In this challenge, participants are asked to mark up four Congressional bills, provided as raw text, into Akoma Ntoso.

If you have the time to participate in this challenge and can fulfill all the eligibility rules, then I encourage you to step up to the challenge. This is a good opportunity to give Akoma Ntoso a try – to both learn the new model and to help us to identify any changes or adaptations that must be made to make Akoma Ntoso suitable for use with Congressional legislation.

You are asked, as part of you submission, to identify gaps in Akoma Ntoso’s design along with documenting the methodology you used to construct your solution to the four bills. You’re also encouraged to use any of the available open-source editors that are currently available for editing Akoma Ntoso and to provide feedback on their suitability to the task.

I would like to point out that I also provide an Akoma Ntoso editor at http://legisproweb.com. It is free to use on the web along with full access to all the information you need to customize the editor. However, while our customers do get an unrestricted internal license to the source code, our product is not open source. At the end of the day, I must still make a living. Nonetheless, I believe that you can use any editor you wish to create your four Akoma Ntoso documents – it’s just that the sponsors of the competition aren’t looking for feedback on commercial tools. If you do choose to use my editor, I’ll be there to provide any support you might need in terms of features and bug fixes to help speed you on your way.

Standard
Akoma Ntoso, Standards, Transparency

Legislative Data: The Book

Last week, as I was boarding the train at Admiralty station in Hong Kong to head back to the office, I learned that I am writing a book. +Ari made the announcement on his blog. It seems that Ari has found the key to getting me to commit to something – put me in a situation where not doing it is no longer an option. Oh well…

Nonetheless, there are many good reasons why now is a good time to write a book. In the past year we have experienced a marked increase in interest in the subject of legislative data. I think that a number of factors are driving this. First, there is renewed interest in driving towards a worldwide standard – especially the work being done by the OASIS LegalDocumentML technical committee. Secondly, the push for greater transparency, especially in the USA, is driving governments to investigate opening up their databases to the outside world. Third, many first generation XML systems are now coming due for replacement or modernization.

I find myself in a somewhat fortuitous position of being able to view these developments from an excellent vantage point. From my base in San Diego, I get to work with and travel to legislatures around the world on a regular basis. This allows me to see the different ways people are solving the challenges of implementing modern legislative information managements systems. What I also see, is how many jurisdictions struggle to set aside obsolete paper-based models for how legislative data should be managed. In too many cases, the physical limitations of paper are used to define the criteria for how digital systems should work. Not only do these limitations hinder the implementation of modern designs, they also create barriers that will prevent fulfilling the expectations that come as people adapt to receiving their information online rather than by paper.

The purpose of our book will be to propose a vision for the future of legislative data. We will share some of our experiences around the world – focusing on the successes some legislatures have had as they’ve broken legacy models for how things must work. In some cases the changes involve simply better separating the physical limitations of the published form from the content and structure. In other cases, we’ll explain how different procedures and conventions can not only facilitate the legislative process, but also make it more open and transparent.

We hope that by producing a book on the subject, we can help clear the path for the development of a true industry to serve this somewhat laggard field. This will create the conditions that will allow a standard, such as Akoma Ntoso, to thrive which, in turn, will allow interchangeable products to be built to serve legislatures around the world. Achieving this goal will reduce the costs and the risks of implementing legislative information management systems and will allow the IT departments of legislatures to meet both the internal and external requirements being placed upon them.

Ari extended an open invitation to everyone to propose suggestions for topics for us to cover. We’ve already received a lot of good interest. Please keep your ideas coming.

Standard
Akoma Ntoso, HTML5, LegisPro Web, Standards, Transparency

2013 Legislative Data and Transparency Conference

Last week I participated in the 2013 Legislative and Transparency Conference put on by the U.S. House of Representatives in Washington D.C.

It was a one day event that featured numerous speakers both within the U.S. government and in the surrounding transparency community around D.C. My role, at the end of the day, was to speak as a panelist along with Josh Tauberer of GovTrack.us and Anne Washington of The George Washington University on Under-Digitized Legislative Data. It was a fun experience for me and allowed me to have a friendly debate with Josh on API’s versus bulk downloads of XML data. In the end, while we both fundamentally agree, he favors bulk downloads while I favor APIs. It’s a simple matter of how we use the data.

The morning sessions were all about the government reporting the progress they have made over the past year relating to their transparency initiatives. There has been substantial progress this year and this was evident in the various talks. Particularly exciting was the progress that the Library of Congress is making in developing the new congress.gov website. Eventually this website will expand to replace THOMAS entirely.

The afternoon sessions were kicked off by Gherardo Casini of the UN-DESA Global Centre for ICT in Parliament in Rome, Italy. He gave an overview of the progress, or lack thereof, of XML in various parliaments and legislatures around the world. He also gave a brief mention of the progress in the LegalDocumentML Technical Committee at OASIS which is working towards the standardization of Akoma Ntoso. I am a member of that technical committee.

The next panel was a good discussion on extending XML. The panelists were Eric Mill at the Sunlight Foundation who, among other things, talked about the HTML transformation work he has been exploring in recent weeks. I mentioned his efforts in my blog last week. Following him was Jim Harper at the Cato Institute. He talked about the Cato Institute’s Deepbills project. Finally, Daniel Bennett gave a talk on HTML and microdata. His interest in this subject was also mentioned in my blog last week.

One particularly fun aspect of the conference was walking into the entrance and noticing the Cato Institute’s Deepbills editor running on the table at the entrance. The reason it was fun for me is that their editor is actually a customization of an early version of the HTML5-based LegisPro Web editor which I have spent much of the past year developing. We have developed this editor to be an open and customizable platform for legislative editing. The Cato Project is one of four different implementations which now exist – two are Akoma Ntoso based and two are not. More news will come on this development in the not-too-distant future. I had not expected the Cato Institute to be demonstrating anything and it was quite a nice surprise to see software I had written up on the display.

If there was any recurring theme throughout the day, it was the call for better linked data. While there has been significant progress over the past year towards getting the data out there, now it is time to start linking it all together. Luckily for me, this was the topic I had chosen to focus on in my talk at the end of the day. It will be interesting to see the progress that is made towards this objective this time next year.

All in all, it was a very successful and productive day. I didn’t have a single moment to myself all day. There were so many interesting people to meet that I didn’t get a chance to chat with nearly as many as I would have liked to.

For an amusing yet still informative take on the conference, check out Ari Hershowitz’s Tabulaw blog. He reveals a little bit more about some of the many projects we have been up to over the past year.

https://cha.house.gov/2013-legislative-data-and-transparency-conference

Standard
Akoma Ntoso, HTML5, Standards, Transparency, W3C

XML, HTML, JSON – Choosing the Right Format for Legislative Text

I find I’m often talking about an information model and XML as if they’re the same thing. However, there is no reason to tie these two things together as one. Instead, we should look at the information model in terms of the information it represents and let the manner in which we express that information be a separate concern. In the last few weeks I have found myself discussing alternative forms of representing legislative information with three people – chatting with Eric Mill at the Sunlight Foundation about HTML microformats (look for a blog from him on this topic soon), Daniel Bennett regarding microdata, and Ari Hershowitz regarding JSON.

I thought I would try and open up a discussion on this topic by shedding some light on it. If we can strip away the discussion of the information model and instead focus on the representation, perhaps we can agree on which formats are better for which applications. Is a format a good storage format, a good transport format, a good analysis/programming format, or a good all-around format?

1) XML:

I’ll start with a simple example of a bill section using Akoma Ntoso:

<section xmlns="http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03" 
       id="{GUID}" evolvingId="s1">
    <num>§1.</num>
    <heading>Commencement </heading>
    <content> <p>This act will go into effect on 
       <date name=”effectiveDate” date="2013-01-01">January 1, 2013</date&gt;. 
    </p> </content>
</section> 

Of course, I am partial to XML. It’s a good all-around format. It’s clear, concise, and well supported. It works well as a good storage format, a good transport format, as well as being a good format of analysis and other uses. But it does bring with it a lot of complexity that is quite unnecessary for many uses.

2) HTML as Plain Text

For developers looking to parse out legislative text, plain text embedded in HTML using a <pre> element has long been the most useful format.

   <pre>
   §1. Commencement
   This act will go into effect on January 1, 2013.
   </pre>

It is a simple and flexible represenation. Even when an HTML represenation is provided that is more highly decorated, I have always invariably removed the decorations to leave behind this format.

However, in recent years, as governments open up their internal XML formats as part of their transparency intiatives, it’s becoming less necessary to write your own parsers. Still, raw text is a very useful base format.

3) HTML/HTML5 using microformats:

<div class="section" id="{GUID}" data-evolvingId="s1">
   <div>
      <span class="num">§1.</span> 
      <span class=”heading”>Commencement </span>
   </div>
   <div class="content"><p>This act will go into effect on 
   <time name="effectiveDate" datetime="2013-01-01">January 1, 2013 <time>. 
   </p></div>
</div>

As you can see, using HTML with microformats is a simple way of mapping XML into HTML. Currently, many legislative data sources that offer HTML content either offer bill text as plain text as I showed in the previous example or they decorate it in a way that masks much of the semantic meaning. This is largely because web developers are building the output to an appearance specification rather than to an information specification. The result is class names that better describe the appearance of the text than the underlying semantics. Using microformats preserves much of the semantic meaning through the use of the class attribute and other key attributes.

I personally think that using HTML with microformats is a good way to transport legislative data to consumers that don’t need the full capabilities of the XML representation and are more interested in presenting the data rather than analyzing or processing it. A simple transform could be used to take the stored XML and to then translate it into this form for delivery to a requestor seeking an easy-to-consume solution.

[Note: HTML5 now offers a <section> element as well as an <article> element. However, they’re not a perfect match to the legislative semantics of a section and an article so I prefer not to use them.]

4) HTML5 Microdata:

<div itemscope 
      itemtype="http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#section" 
      itemid="urn:xcential:guid:{GUID}">
   <data itemprop="evolvingId" value="s1"/>
   <div>
      <span itemprop="num">§1.</span>
      <span itemprop="heading">Commencement </span>
   </div>
   <div itemprop="content"> <p>This act will go into effect on 
      <time itemprop="effectiveDate" time="2013-01-01">January 1, 2013 </time>.
   </p> </div>
</div>

Using microdata, we see more formalization of the annotation convention than microformats offers – which brings along additional complexity and requires some sort of naming authority which I can’t say I either really understand or see how it will happen. But it’s a more formalized approach and is part of the HTML5 umbrella. I doubt that microdata is a good way to represetn a full document. Rather, I see microdata better fitting in to the role of annotating specific parts of a document with metadata. Much like microformats, microdata is a good solution as a transport format to a consumer not interested in dealing with the full XML representation. The result is a format that is rich in semantic information and is also easily rendered to the user. However, it strikes me that the effort to more robustly handle namespaces only reinvents one of XMLs more confusing aspects, namely namespaces, in just a different way.

5) JSON

{
   "type": "http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#section",
   "id": "{GUID}",
   "evolvindId": "s1",
    "num" : {
      "type": "http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#num",
      "text": "§1."
   },
   "heading":  {
      "type": "http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#heading",
      "text": "Commencement"
   },
   "content": {
      "type": "http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#content",
      "text1": "This act will go into effect on "
      "date": {
         "type": "http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#date",
         "date": "2013-01-01",
         "text": "January 1, 2013"
      }
      "text2": "."
   }
}

Quite obviously, JSON is great if you’re looking to easily load the information into your programmatic data structures and aren’t looking to present the information as-is to the user. This is a programmatic format primarily. Representing the full document in JSON might be overkill. Perhaps the role of JSON is for key parts of extracted metadata than the full document.

There are still other formats I could have brought up like RDFa, but I think my point has been made. There are many different ways of representing the same legislative model – each with its own strength and weaknesses. Different consumers have different needs. While XML is a good all-around format, it also brings with it some degree of sophistication and complexity that many information consumers simply don’t need to tackle. It should be possible, as a consumer, to specify the form of the information that most closely fits my need and have the legislative data source deliver it to me in that format.

[Note: In Akoma Ntoso, the format is called the “manifestation.” and is specified as part of the referencing specification.]

What do you think?

Standard
Akoma Ntoso, Standards, Transparency

Legal Reference Resolvers

After my last blog post I received a lot of feedback. Thanks to everyone who contacted me with questions and comments. After all the interest in the subject, I think I will devote a few more blog posts to the subject of legal references. It is quite possibly the most important subject that needs to be tackled anyway. (And yes, Harlan, I will try and blog more often.)

Many of the questions I received asked how I envision the resolver working. I thought I would dive into this aspect some more by defining the role of the resolver:

The role of a reference resolver is to receive a reference to a document or a fragment thereof and to do whatever it takes to resolve it, returning the requested data to the requestor.

That definition defines the role of a resolver in pretty broad terms. Let’s break the role down into some discrete functions:

  1. Simple Redirection – Perhaps the most basic service to provide will be that of a reference redirector. This service will convert a standardized virtual reference into a non-standard URL that is understood by a proprietary repository available elsewhere on the web that can supply the data for the request. The redirection service allows a legacy repository to provide access to documents following its own proprietary referencing mechanism without having to adopt the standard referencing nomenclature. In this case, the reference redirector will serve as a front to the legacy repository, mapping the standard references into non-standard ones.

  2. Reference Canonicalization – There are often a number of different ways in which a reference to a legal document can be composed. This is partly because the manner in which legal documents are typically structured sometimes encourages both a flat and a hierarchical view of the same data. For instance, one tends to think of section in a flat model because sections are usually sequentially numbered. Often however, those sections are arranged in a hierarchical structure which allows an alternate hierarchical model to also be valid. Another reason for alternate references is the simple fact that there are all sorts of different ways of abbreviating the same thing – and it is impossible to get everyone around the world to standardize on abbreviations. So “section1”, “sec1”, “s1”, and the even more exotic “§1” need to be treated synonymously. Also, let’s not forget about time. The requestor might be interested in the law as it existed on a particular date. The resulting reference will be formulated in a manner in which it starts being more of a document query rather than a document identifier. For instance, imagine a version of a section that became operational January 1, 2013. A request for the section that was in operation on February 1, 2013 will return that January 1 version if that version was still in operation on February 1 even though the operational date of the version is not February 1. (Akoma Ntoso calls the query case a virtual expression and differentiates it from the case where the date is part of the identifier)

    The canonicalization service will take any reference, perhaps vague or malformed, and will return one or more standardized references that precisely represent the documents that could be identified by the original reference – possibly along with a measure of confidence. I would imagine that official data services, providing authoritative legal documents, will most likely provide the canonicalization service.

  3. Repository Service – A legal library might provide both access to a document repository and an accompanying resolution service through which to access the repository. When this is the case, the resolver acts as an HTTP interface to the library, converting a virtual URL to an address of sorts in the document repository. This could simply involve converting the URL to a file path or it could involve something more exotic, requiring document extraction from a database or something similar.

    There are two separate use cases I can think of for the repository. The basic case is the repository as a read-only library. In this case, references are simply resolved, returning documents or fragments as requested. The second case is somewhat more complex and will exist within organizations tasked with developing legal resources – such as the organizations that draft legislation within the government. In this case, a more sophisticated read/write mechanism will require the resolver to work with technologies such as WebDAV which front for the database. This is a more advanced version of the solution we developed for use internally by the State of California.

  4. Resolver Routing – The most complex, and perhaps most difficult to achieve aspect, will be resolver routing. There is never going to exist a single resolver that can resolve every single legal reference in the world. There are simply too many jurisdictions to cover – in every country, state/province, county/parish, city/town, and every other body that produces legal documents. What if, instead, there was a way for resolvers to work together to return the document requested? While a resolver might handle some subset of all the references it receives on its own, for the cases it doesn’t know about, it might have some means to negotiate or pass on the request to other resolvers it knows about in order to return the requested data.

Not all resolvers will necessarily provide all the functions listed. How resolvers are discovered, how they reveal the functions they support, and how resolvers are tied together are all topics which will take efforts far larger than my simple blog to work out. But just imagine how many problems could be resolved if we could implement a resolving protocol that would allow legal references around the world to be resolved in a uniform way.

In my next blog, I’m going to return to the reference itself and take a look at the various different referencing mechanisms and services I have discovered in recent weeks. Some of the services implement some of the functions I have described above. I also want to discuss the difference between an absolute reference (including the domain name) and a relative reference (omitting the domain name) and why it is important that references stored in the document be relative.

Standard
Akoma Ntoso, Standards, W3C

Automating Legal References in Legislation

This is a blog I have wanted to write for quite some time. It addresses what I believe to be the single most important issue when modeling information for legal informatics. It is also, I believe, the most urgent aspect that we need to agree upon in order to promote legal informatics as a real emerging industry. Today, most jurisdictions are simply cobbling together short term solutions without much consideration to the big picture. With something this important, we need to look at the big picture first and come up with a lasting solution.

Citations, references, or links are a very important aspect of the law. Laws are inherently a web of interconnections and interdependencies. Correctly resolving those connections allows us to correctly interpret the law. Mistakes or ambiguities in how those connections are made is completely unacceptable.

I work on projects around the world as well as my work on the OASIS LegalDocumentML technical committee. As I travel to the four corners of the Earth, I am starting to see more clearly how this problem can be solved in a clean and extensible manner.

There are, of course, already many proposals to address this. The two I have looked at the most are both from Italy:
A Uniform Resource Name (URN) Namespace for Sources of Law (LEX)
Akoma Ntoso References (in the process of being standardized by OASIS)

My thoughts derive from these two approaches, both of which I have implemented in one way or another, with varying degrees of success. My earliest ideas were quite similar to the LEX-URN proposal by being based around URNs. However, with time Fabio Vitali at the University of Bologna has convinced me that the approach he and Monica Palmirani put forth with Akoma Ntoso using URLs is more practical. While URNs have their appeal, they really have not achieved critical mass in terms of adoption to be practical. Also, the general reaction I have gotten with LEX-URN encoded references has not been positive. There is just too much special encoding going on within them for them to be readable by the uninitiated.

Requirements

Before diving into this subject too deep, let’s define some basic requirements. In order to be effective, a reference must:
• Be unambiguous.
• Be predictable.
• Be adaptable to all jurisdictions, legal systems, and all the quirks that arise.
• Be universal in application and reach.
• Be implementable with current tools and technologies.
• Be long lasting and not tied to any specific implementation
• Be understandable to mere mortals like myself.

URI/IRI

URIs (Uniform Resource Identifiers) give us a way to identify resources in a computing system. We’re all familiar with URLs that allow us to retrieve pages across the web using hierarchical locations. Less well known are URNs which allow us to identify resources using a structured name which presumably will then be located using some form of a service to map the name to a location. The problem is, a well-established locating service has never come about. As a result, URNs have languished as an idea more than a tool. Both URLs and URNs are forms of URIs.

IRIs are a generalization of URIs to allow characters outside of the ASCII character set supported by normal URIs. This is important in jurisdictions that use more complex character than ASCII supports.

Given the current state of the art in software technology, basing references on URIs/IRIs makes a lot of sense. Using the URL/IRL variant is the safer and more universally accepted approach.

FRBR

FRBR is the Functional Requirements for Bibliographical Records. It is a conceptual entity-relationship model developed by librarians for modeling bibliographic information in databases. In recent years it has received a fair amount of attention for use as the basis for legal references. In fact, both the LEX-URN and the Akoma Ntoso models are based, somewhat loosely, on the model. At times, there is some controversy as to whether this model is appropriate or not. My intent is not to debate the merits of FRBR. Instead, I simply want to acknowledge that it provides a good overall model for thinking about how a legal reference should be constructed. In FRBR, there are four main entities:
1. Work – The work is the “what”, allowing us to specify what it is that we are referring to, independent of which version or format we are interested in.
2. Expression – The expression answers the “from when” question, allowing us to specify, in some manner, which version, variant, or time frame we are interested in.
3. Manifestation – The manifestation is the “which format” part, where we specify the format that we would like the information returned as.
4. Item – The item finally allows us to specify the “from where” part, when multiple sources of the information are available, that we want the information to come from.

That’s all I want to mention about FRBR. I want to pick up the four concepts and work from them.

What do we want?

Picking up the Akoma Ntoso model for specifying a reference as a URL, and mindful of our basic requirements, a useful model to reference a resource is as a hierarchical URL, starting by specifying the jurisdiction and then working hierarchically down to the item in question.

This brings me to the biggest hurdle I have come across when working with the existing proposals. It’s not terribly clear what a reference should be like when the item being referenced is a sub-part of a resource being modeled as an XML document. For instance, how would I refer to section 500 of the California Government Code? Without putting in too much thought, the answer might be something like /us-ca/codes/gov.xml#sec500, using a URL to identify the Government Code followed by a fragment identifier specifying section 500 of the Government Code. The LEX URN proposal actually suggests using the # fragment identifier, referring to the fragment as a partition. There are two problems with this solution though. First, any browser will interpret a reference using the fragment identifier as two parts – the part before the # fragment identifier showing the resource to be retrieved from the server and the part after the fragment identifier as an “id” to the item to scroll to. Retrieving the huge Government code when all we want is the one sentence in Section 500 is a terrible solution. The second problem is that it defines, possibly for all time, how a large document might have been constructed out of sub-documents. For example, is the US Code one very large document, does it consist of documents made out of the Titles, or as it is quite often modeled, is every section a different document? It would be better if references did not capture any part of this implementation decision. A better approach is to allow the “what” part of a reference to be specified as a virtual URL all the way down to whatever is wanted, even when the “what” is found deep inside an XML document in a current implementation. For example, the reference would better be specified as /us-ca/codes/gov/sec500. We’re not exposing in the reference where the document boundaries currently exist.

On to the next issue, what happens when there is more than one possible way to reference the same item? For example, the sections in California’s codes, as is usually the case, are numbered sequentially with little regard to the heading hierarchy above the sections. So a reference specified as /us-ca/codes/gov/sec500 is clear, concise, and unambiguous. It follows the manner in which sections are cited in the text. But /us-ca/codes/gov/title1/div3/chap6/sec500 is simply another way to identify the exact same section. This happens in other places too. /us-ca/statutes/2012/chap5 is the same document as /us-ca/bills/2011/sb730. So two paths identify the same document. Do we allow two identities? Do we declare one as the canonical reference and the other as an alternate? It’s not clear to me.

What about ambiguity? Mistakes happen and odd situations arise. Take a look at both Chapter 14s that exist in Division 6 of Title 1 of the California Government Code. There are many reasons why this happens. Sometimes it’s just a mistake and sometimes it’s quite deliberate. We have to be able to support this. In California, we disambiguate by using “qualifying language” which we embed somehow into the reference. The qualifying language specifies the last statute to create or amend the item needing disambiguation.

The From When do we want it?

A hierarchical path identifies, with some disambiguation, what it is we want. But chances are that what we want has varied over time. We need a way to specify the version we’re looking for or ask for the version that was valid at a specific point in time. Both the LEX URN and the Akoma Ntoso proposals for references suggest using an “@” sign around some nomenclature which identifies a version or date. (The Akoma Ntoso proposal adds the “:” sign as well)

A problem does arise with this approach though. Sometimes we find that multiple versions exist at a particular date. These versions are all in effect, but based on some conditional logic, only one might be operational at a particular time. How one deals with operational logic can be a bit tricky at times. That’s an open issue to me still.

Which Format do we want?

I find specifying the format to be relatively uncontroversial. The question is whether we specify the format using well established prefixes such as .pdf, .odt, .docx, .xml, and .html or whether we instead try to be more precise by embedding or encoding the MIME type into the reference. Personally, I think that simple extensions, while less rigorous and subject to unfortunate variations and overlaps, offer a far more likely to be adopted approach than trying to use the MIME type somehow. Simple generally wins over rigorous but more complex solutions.

The From Where should it come?

This last part, the from where should it come part, is something that is often omitted from the discussion. However, in a world where multiple libraries offering the same resource will quite likely exist, this is really important. Let’s take a look at the primary example once more. We want section 500 of the California Government Code. The reference is encoded as /us-ca/codes/gov/sec500. Where is this information to come from? Without a domain specified, our URL is a local URL so the presumption is that it will be locally resolved – the local system will find it, somehow. What if we don’t want to rely on a local resolution function? What if there are numerous sources of this data and we want to refer to one of them in particular. When we prepend the domain, aren’t we specifying from where we want the information to come from? So if we say http: //leginfo.ca.gov/us-ca/codes/gov/sec500, aren’t we now very precisely specifying the source of the information to be the official California source? Now, say the US Library of Congress decides to extend Thomas to offer state legislation. If we want to specify that copy, we would simply construct a reference as http: //thomas.loc.gov/us-ca/codes/gov/sec500. It’s the same URL after the domain is specified. If we leave the URL as simply /us-ca/codes/gov/sec500, we have a general reference and we leave it to the local system to provide the resolution service for retrieving and formating the information. We probably want to save references in a general fashion without a domain, but we certainly will need to refer to specific copies within the tools that we build.

Resolvers

The key to making this all work is having resolvers that can interpret standardized references and find a way to provide the correct response. It is important to realize that these URLS are all virtual URLs. They do not necessarily resolve to files that exist. It is the job of the resolving service to either construct the valid response, possibly by digging into database and files, or to negotiate with other resolvers that might do all or part of the job of providing a response. For example, imagine that Cornell University offers a resolver at http: //lii.cornell.edu. It might, behind the scenes, work with the official data source at http: //leginfo.ca.gov to source California legislation. Anyone around the world could use the Cornell resolver and be unaware of the work it is doing to source information from resolvers at the official sources around the world. So the local system would be pointed to the Cornell service and when the reference /us-ca/codes/gov/sec500 arose, the local system would defer to the LII service for resolution which in turn would defer to California’s official resolver. In this way, the resolvers would bear the burden of knowing where all the official data sources around the world are located.

Examples

So to end, I would like to sum up with some examples:

[Note that the links are proposals, using a modified and simplified form of the Akoma Ntoso proposal, rather than working links at this point]

/us-ca/codes/gov/sec500
– Get section 500 of the California Government Code. It’s up to the local service to decide where and how to resolve the reference.

http: //leginfo.ca.gov/us-ca/codes/gov/sec500
– Get Section 500 of the California Government Code from the official source in California.

http: //lii.cornell.edi/us-ca/codes/gov/sec500
– Get Section 500 of the California Government Code from Cornell’s LII and have them figure where to get the data from

/us-ca/codes/gov/sec500@2012-01-01
– Get Section 500 of the California Government Code as it existed on January 1, 2012

/us-ca/codes/gov/sec500@2012-01-01.pdf
– Get Section 500 of the California Government Code as it existed on January 1, 2012, in a PDF format

/us-ca/codes/gov/title1/div3/chap6/sec500
– Get Section 500 of the California Government Code, but the fully hierarchy is specified

My blog has gotten very long and I have only just started to scratch the surface. I haven’t addressed multilingual issues, alternate character sets, and a host of other issues at all. It should already be apparent that this is all simply a natural extension of the URLs we already use, but with sophisticated services underneath resolving to items other than simple files. Imagine for a moment how the field of legal informatics could advance if we could all agree to something this simple and comprehensive soon.

What do you think? Are there any other proposals, solutions, or prototypes out there that addresses this? How does the OASIS legal document ML work factor into this?

Standard
Akoma Ntoso, Standards

“A Bill is a Bill is a Bill!”

I remember overhearing someone exclaim “A Bill is a Bill is a Bill” many years ago. What she meant is that a bill should be treated as a bill in the same way regardless of the system it flows through.

I’m going to borrow her exclamation to refer to something a bit different by rephrasing it slightly. Is a bill a Bill and a akn:bill? Lately I’ve been working in a number of different jurisdictions, and through my participation on OASIS and the LEX Summer School, with many other people from even more jurisdictions. It sometimes seems that everything bogs down when it comes to terminology. In the story of Babel, God is said to have divided the world into different languages to thwart the people’s plans to build the tower of Babel. Our problem isn’t that we’re all speaking different languages. Rather it’s that we all think we’re speaking the same language but our definitions are all different.

The way the human brain learns means that we learn incrementally. I remember someone once telling me that you can only know that which you almost knew before. When we try and learn a new thing, we cling to the parts we think we understand and our brain filters out the things we don’t. Ever learn a new word and then realize that its being used all around you every day – but you seemingly had never heard it before? So what happens when we recognize terms that are familiar, but fail to notice the details that would tell us that the usage isn’t quite what we expect? We misinterpret what we learn and then we make mistakes. Usually, through a slow process of trial and error, we learn from our mistakes and eventually we master the new subject. This takes time, limiting our effectiveness until our competency rises to an adequate level.

Let’s get back to the notion of a bill. Unfortunately in legislation, there is often a precise definition and an imprecise definition for a term and we don’t always know which is being used. What’s worse, if we’re not indoctrinated in the terminology, we might never realize that there is ambiguity in what we are hearing. For instance, I know of three different definitions for the word “bill”:

  1. The first usage is a very precise definition and it describes a document, introduced in a legislative body, that proposes to modify existing law. I will call this document a Bill (with a capital B). In California, all legislative documents which modify law are Bills. Subdivision (b) of Section 8 or Article 4 of the California Constitution defines it this way. At the federal level, the same definition of a bill exists, except that another document, the Joint Resolution, has similar powers to enact law, but this document is not a Bill.
  2. The second definition is the much looser definition that applies the word to any official documents acted upon by a legislature. I will use the term bill (with a lower-case b) when referring to this usage. In California, the precise term that is synonymous with bill is Measure. At the federal level, the precise term used is either Measure or Resolution. Of course, this opens up more confusion. In California, Measures are either Bills (and will affect law when enacted) or are Resolutions which express a statement from the legislature or house without directly affecting the law. So now the Federal Resolution is a synonym for a Measure while the California Resolution is a subclass of it. The Federal equivalent of a California Resolution is a Non-binding Resolution.
  3. The third definition is the Akoma Ntoso definition of a bill which I will refer to as the akn:bill. At first glance, it appears to equate with the precise definition of a Bill. It is defined as a document that affects the law upon enactment. But this definition breaks down. The akn:bill applies more broadly than the precise definition of a Bill but not as broadly as the imprecise definition of a bill. So an akn:bill applies to the federal Joint Resolution, Constitutional Amendments, California Initiatives along with the precise notion of a Bill.

I can summarize all this by saying that all Bills are akn:bills, and all akn:bills are bills, but not all bills are akn:bills, and not all akn:bills are Bills.

As if this isn’t confusing enough, other terms are even more overloaded. As I already alluded to, the term resolution is quite overloaded and the term amendment is even worse. Even the distinction between a bill and an act is unclear. Apparently a Bill, when it has passed one house, technically becomes an Act even before it passes the opposite house, but the imprecise term bill generally continues to be used.

To try and untangle all this confusion and allow us to communicate with one another more effectively, I have started a spreadsheet to collect all the terms and definitions I come across during my journey through this subject. My goal is to try and find the root concepts that are hidden underneath the vague and/or overloaded terminology we use and hopefully find some neutral terms which can be used to disambiguate communication between people coming from different legislative traditions. The beginning of my effort can be found at:

https://docs.google.com/spreadsheet/ccc?key=0ApeIHP2TOckZdHRPZ1pIeVRUcVpJZzZQT1BCSHFqd0E

Please feel free to contribute. Send me any deviations and additions you may have. And if you note an error in my understanding, please let me know. I am still learning this myself.

Standard
Akoma Ntoso, Hiring, HTML5, LegisPro Web

My Week in Ravenna at the Akoma Ntoso Developer’s Workshop

Last week I attended the LEX Summer School 2012 and the Akoma Ntoso Developer’s Workshop in Ravenna, Italy. These events were hosted by the University of Bologna. For me, this was my third year that I visited Ravenna. I had originally attended the LEX Summer School 2010 as a student, returned last year as an invited speaker, and this year was again a speaker and a teacher at the Developer’s Workshop.

As always, the visit provided me with a great opportunity to compare notes with my colleagues around the world, learn a thing or two, and share my own experiences in the field of legal informatics. Of course, there has been a lot of development in Akoma Ntoso since 2010. Back then, Akoma Ntoso was a hard to pronounce secret that nobody had heard about. Today, people are slowly learning how to pronounce Akoma Ntoso, the schema is well on its way to becoming a standard, and it is far better known in this field.

The biggest development undoubtedly didn’t happen in Ravenna but rather at the World e-Parliaments Conference in Rome. There a number of parliaments and legislative bodies from around the world committed to providing some support for Akoma Ntoso in the future. Moving forward to a common standard will be the biggest advancement for the still fledgling legal informatics industry.

Back in Ravenna at the LEX Summer School, I once again presented legix.info which demonstrates the application of Akoma Ntoso to California legislation as well as other select legislative documents from around the work.

Once the LEX Summer School concluded, we switched to the Akoma Ntoso Developer’s Workshop. While the group was small, the people involved were all hard-core legisltive information systems developers and the discussions were lively and fruitful. A number of tools were described:

  • Bungeni – Ashok Hariharan described the Bungeni system based around OpenOffice that has been developed forthe Global Centre for ICT in Parliaments at UNDESA. Bungeni is a full fledged legislative information system incorporating an editor, workflow, calendaring, citizen participation, and other modules.
  • Italian Senate Editor – Carlo Marchetti and Roberto Battistoni of the Italian Senate describe their editor built on an earlier version of MS Word using the custom XML tagging technology that Microsoft has since been forced to remove. As this technology has been abandoned by Microsoft, the Italian Senate is having to implement an alternative approach using Word content controls.
  • Unibo Web-Based Editor – Monica Palmirani surprised us all by introducing a web-based editor that the University of Bologna is developing for Akoma Ntoso. It’s quite similar to the LegisPro Web editor having been inspired by my work. As can be expected, they are learning many of the same lessons that I have learned on my journey. Where their approach differs from mine is that they’re relying less on new HTML5 capabilities and more on pre-existing open source frameworks and modules.
  • AT4AM – Philip Luppens described the AT4AM amendment tool developed for the European Union Parliament. This is a web-based editing environment for creating European style amendments. It is built using the Google Web Toolkit (GWT). The EU have announced plans to produce an updated open source version of AT4AM.
  • LegisPro Web – I concluded the workshop by describing our new HTML5-based LegisPro Web editor for Akoma Ntoso. I spent much of my time describing the HTML5 technologies we use and the issues I have uncovered in adapting Akoma Ntoso to an editing environment for US style legislation. It was an interesting an engaging conversation. It’s always a bit nerve-wracking to put your own work up on display for other people to tear apart, but the results are always a better product in the end.

In the meantime while I was in Italy, there have been a lot of developments back stateside. There are now two active projects to implement the LegisPro Web editor. One project is based on Akoma Ntoso while the other uses the US House DTD. In addition, we are in the process of concluding a contract to add a third project, using a custom schema developed by one of the states in the USA. And if that wasn’t enough, we have just received another large contract to update a legislative system.

We need a few good people!!! All this progress means that we’re going to be stretched thin going into the future. And what that means is that we’re looking for a few good people. If you or anyone you know is looking for a new position in this fledgling legal informatics industry, please contact us at jobs@xcential.com. We’re looking for people with a passion for this subject first and foremost. You will ideally have good XML experience in the area of document authoring and will be quite comfortable with schemas and DTDs. Our development platforms are quite diverse. Of course, as we do a lot of cutting edge browser-side development, we need good quality JavaScript developers. This is not just any old JavaScript development though. It’s complex AJAX-style development using the latest HTML5 capabilities, lots of sophisticated DOM manipulation, and complex XSL Transforms. Also, our other client platform is XMetaL. Most of the development in XMetaL is also JavaScript based, with some need for Perl experience. On the backend, we do development in both Java and .Net depending on the customer and we use SQL Server, mySQL, and Oracle XDB for our database work.

Standard
Akoma Ntoso, Hackathon, HTML5, Standards, W3C

Update on our Web-based Legislative Editor

It’s been a while since my last blog post. I’ve been busy working on a number of activities. As a result, I have a lot of news to announce regarding the web-based editor, previously known as the AKN/Editor, that we originally built for the “Unhackathon” back in May.

As you might already have guessed, it has a new name. The new name is “LegisPro Web” which now more clearly identifies its role and relationship to Xcential’s XMetaL-based standalone editor “LegisPro”. Going forward, we will be migrating much of the functionality currently available in LegisPro into LegisPro Web.

Of course, there is now a new web address for the editor – http://legisproweb.com. As before, the editor prototype is freely available for you to explore at this address.

As I write this blog this early Sunday morning, I am in Ravenna, Italy where I just participated in the LEX Summer School 2012 put on by the University of Bologna. On Monday, the Akoma Ntoso Developer’s Workshop starts at the same venue. In addition to listening to the other developers present their work, I will be spending an afternoon presenting all the ins and outs of the LegisPro Web editor. I’m excited to have the opportunity to learn about the other developer’s experiences with Akoma Ntoso and to share my own experiences building a web-based XML editor.

Last month we demonstrated the LegisPro Web editor at the National Conference of State Legislators’s (NCSL) annual summit in Chicago this year. It was quite well received. I remain surprised at how much interest there is in an editor that is targetted to a tagging role rather than an editing role.

Of course, there has been a lot of development of the editor going on behind the scenes. I have been able to substantially improve the overall stability of the editor, its compliance with Akoma Ntoso, as well as add significant new functionality. As I become more and more comfortable and experienced with the new HTML5 APIs, I am starting to build up a good knowledge base of how best to take advantage of these exciting new capabilities. Particularly challenging for me has been learning how to intuitively work with the range selection mechanism. The origins of this mechanism are loosely related to the similar mechanism that is available within XMetaL. While I have used XMetaL’s ranges for the past decade, the HTML5 mechanisms are somewhat more sophisticated. This makes them correspondingly harder to master.

And perhaps the most exciting news of all is that the editor now has some customers. I’m not quite ready to announce who they are, but they do include a major government entity in a foreign country. As a result of this win, we will be further expanding our support of Akoma Ntoso to support Debate and Debate Reports in addition to the Bill and Act documents types we currently support. In addition, we will be adding substantial new capabilities which are desired by our new customers. I should also mention that Ari Hershowitz (@arihersh) has joined our team and will be taking the lead in delivering the customized solution to one of our customers.

Alongside all this development, work continues at the OASIS LegalDocumentML Technical Committee. Look to see me add support for the Amendment document type to the editor in support of this activity in the not- too-distant future.

All in all, I think we’re making terrific progress bringing the LegisPro Web editor to life. I started work on the editor as a simple idea a little more than six months ago. We used the “Unhackathon” in May to bootstrap it to life. Since then, it’s taken off all on its own and promises to become a major part of our plans to build a legitimate legal informatics industry around an Akoma Ntoso based standard.

Standard