Akoma Ntoso, Standards

Common Identifiers or a Common Data Format. What is more important?

I just read this excellent post by Tom Bruce, et. al. from the Legal Information Institute at the Cornell University Law School.

Tom’s post brought to mind something I have long wrestled with. (Actually so long that it was a key part of my job working on CAD systems in the aerospace industry long ago). I sometimes wonder if having common identifiers isn’t more important than having a common data format. The reason is that being able to unambiguously establish relationships is both very difficult and very useful. In fact, one of the reasons you want a common format is so that you can find and establish these identifiers.

I have used a number of schemes for identifiers over the past ten years. Most of the schemes I have used have involved Uniform Resource Names or URNs. A decade ago we designed a URN mechanism for use in the California Legislature. Our LegisWeb product uses a very similar URN schema based on the lessons learned on the California Project. In more recent years I have experimented with the URN:lex proposal and the URL-based proposal that is within Akoma Ntoso. Both of these proposals are based around FRBR. I can’t say I have found the ideal solution.

I favor URN-based mechanism for a number of reasons. URNs are names – that was their intent as defined by the IETF. They are not meant to imply a location or organizational containment (well mostly they aren’t). In theory, these identifiers can be passed around and resolved to find the most relevent representation when needed. But there is a problem. URNs have never really been accepted. While they conceptually are very valuable, their poor acceptance and lack of supporting tools tends to undermine their value.

Akoma Ntoso takes a different approach. It uses URLs intead of URNs. Rather than using URLs as locations though, they are used as if they are identifiers. It is the duty of the webserver and applications plugged into the webserver to intercept the URLs and treat them as identifiers. In doing this, the webserver provides the same resolution functions that URNs were supposed to offer. My upcoming editor implements this functionality. I have built HTTP handlers that convert URLs into repository queries which retrieve and compose the requested documents. I have it working and it works well – as much as I understand the Akoma Ntoso proposal. I’m still not totally crazy about overloading identifier semantics on top of location semantics though. At least the technology support is better in place.

So what issues have I struggled with?

First of all, none of the proposals seem to adequately address how you deal with portions of documents. There are many issues in the area. The biggest of course is the inherent ambiguity within legislative documents. As Tom mentioned in his post, duplicate numbering often occurs. There are usually perfectly good and valid reasons for this – such as different operational conditions. But sometimes these are simply errors that everyone has to accommodate. Being able to specify the information necessary to resolve the ambiguity is not in any proposal I have seen. Add to that the temporal issues that come with renumbering actions. How do you refer to something that is subject to amendment and renumbering? Do you want a reference to specific wording at a specific point in time, or do you want you reference to track with amendments and renumbering?

At this point people often ask me why a hash identifier followed by a cleverly designed element id won’t work. The first thing you have to realize is that the # means something to the effect of “retrieve the document and then scroll to that Id”. The semantic I am looking for is “retreive the document fragment located at Id”. The importance of the difference becomes obvious when you realize that the client browser holds the “#” part of the request and all the server sees is the document URL, minus the hash and identifier. When your document is a thousand pages long and all you want is a single section, that distinction is quite important. Secondly managing ids across renumbering actions is very messy and introduces as many problems as it solves.

Secondly, the referencing mechanism tends to be documented oriented. Certainly, Akoma Ntoso uses virtual URL identifiers to refer to much more than simple documents, but the whole approach gets cumbersome and hard to explain. (If you want to appreciate this, try and explain XML schema’s namespace URI/URL concept to an uninitiated developer.) What’s more, it’s not clear if a common URL mechanism does enough to establish common enough practices for the effort to be useful. For instance, what if I want to refer to the floor vote after the second reading in the Assembly? Is there a reference to that? In California there is. That’s because the results of that vote are reported as a document. But there is nothing that says this should be the case. I have had the need to interrelate a number of ancilliary documents with the legislation. How to do that in a consistent way is not all that clear cut.

The third problem is user acceptance. The URN:Lex proposal, in particular, looks quite daunting. It uses lots of characters like @, $, ;, While end users can be shielded from some of this, my experience has taught me that even software developers rebel against complexity they can’t understand or appreciate. So far, this has been a struggle.

I’m eagerly awaiting Part 2 of Tom’s post on identifiers. It’s a great subject to explore.

Standard
Akoma Ntoso, Standards, Transparency

Imagine All 50 States Legislation in a Common Format

Last week I expressed dissapointment over NCSL’s opposition to the DATA Act (H.R. 2146). Their reasoning is that the burden this might create on the state’s systems will not be affordable. Contrast this with the topic of the international workshop held in Brussels last week – “Identifying benefits deriving from the adoption of XML-based chains for drafting legislation“. The push toward more transparent government need not be unaffordable.

With that in mind, stop for a while and imagine having the text from all 50 states legislation publishing in a common XML format. Seem like an impossibly difficult and expensive undertaking doesn’t it? With all the requirements gathering, getting systems to cooperate, and getting buy-in throughout the country, this could be another super-expensive project that in the end would fail. What would such a project cost? Millions and millions?

Well, remember again Henry Ford’s quote “If you think you can do a thing or think you can’t do a thing, you’re right”. Would you believe that a system to gather and publish all 50 states has recently been developed, in months rather than years, and on a shoe-string budget? That system is BillTrack50.com. It’s a 50 state bill tracking service. Check it out! We, at Xcential, helped them to do this herculian task by providing a simple and neutral XML and the software to do much of the processing. The press release is here. The format is SLIM, the same format the underlies my legix.info prototype. It’s a simple, easy-to-adopt XML format built on our past decade’s experience in legislative systems. Karen Sahuka at BillTrack50 recently gave a presentation on her product at the Non-profit Technology Conference in San Francisco.

SLIM is not as ambitious as Akoma Ntoso. If you take a gander at my legix.info site, you will see that it’s very easy to go from SLIM to Akoma Ntoso. In fact, going between any two formats is not all that difficult with modern transformation technology. It’s how we built the publishing system for the State of California as well. My point is that with the right attitude, a little innovation, and the right tools, achieving the modern requirements for accountability and transparency need not be out of reach.

Standard
Standards, Transparency

The State of the Art in Legislative Editors and the DATA Act

(My plan was for my next blog to contain a mini-tutorial on my editor, that is still coming this weekend)

A report on legislative editors has just been released in Europe. You can find the report at https://joinup.ec.europa.eu/elibrary/document/isa-leos-final-results. It’s a very interesting read. It’s focused on Europe but is something we should look at seriously in the US.

After almost a decade in this business, I discovered my European counterparts a couple years ago when I attended the LEX Summer School in Ravenna, Italy (Info on ths year’s class can be found here) What struck me was how much innovative work was occurring in Europe compared to the USA. Sure we have plenty of XML initiatives in the USA and there are many examples of modern up-to-day systems we can point to, but there is a lot of fragmentation and duplication of effort and learning. All in all, it is my feeling we’re falling far behind in this field. And yet the Europeans expect and want leadership from the USA; we’re the ones with a society more conducive to innovation and entrepreneurialism.

So how are we doing in the USA? This week the DATA Act (H.R. 2146) passed the U.S. House. It requires accountability and transparency in federal spending. Sounds like a good thing, doesn’t it? One does expect that the government we elect ultimately be accountable to we the people.

The DATA Act, while addressing federal spending, could be the impetus that drives state governments in America to update their systems to publish in open and transparent formats. Viewed as an opportunity, this act could ultimately drive better cooperation amongst the various state legislatures. This cooperation would improve innovation and progress in US legislative systems by focusing on common approaches and open standards. This less insular viewpoint would, as a result, improve efficiency and lower costs. Common standards allow common tools and common tools cost a lot less than full custom solutions. Check out the blog by Andrew Mandelbaum at NDI – http://www.demworks.org/blog/2012/04/how-xml-can-improve-transparency-and-workflows-legislatures.

Henry Ford once said “If you think you can do a thing or think you can’t do a thing, you’re right”. I was disapointed to see NCSL came out with their opposition to the DATA Act. Their reasoning is that the DATA Act is a cost they cannot afford at this point. Certainly, we are all feeling the effects of the economic meltdown in the past few years and it’s hurting the states especially hard. But why can’t the move to open and transparent systems be viewed as an opportunity to improve efficiency and reduce costs? If modern standards-based automation was a liability, would businesses have automated to the extent they have? I don’t see very much focus on using automation as a tool to improve efficiency in legal informatics. It’s an opportunity squandered I think.

If you want to know more about open legislative standards, consider attending our upcoming “unhackathon”. You can sign up here.

Standard
Akoma Ntoso, Hackathon, Standards

Building a Web-Based Legislative Editor

I built the legislative drafting tool used by the Office of the Legislative Counsel in California. It was a long and arduous process and took several years to complete. Issues like redlining, page & line numbers, and the complexities of tables really turned an effort that, while looking quite simple at the surface, into a very difficult task. We used XMetaL as the base tool and customized it from there, developing what has to be the most sophisticated implementation of XMetaL out there. We even had to have a special API added to XMetaL to allow us to drive the change tracking mechanism to support the very specialized redlining needs one finds in legislation.

Using an XML editor allows one to develop a very precise and accurate solution. The price you pay is the rigidity imposed upon you by the XML structure enforced by the editor. We worked very hard to loosen up that rigidity by providing a rules-based system that allowed the user to work within some wiggle room relying the application to sense what was intended and produce the desired rigid structure that XML demands. Another approach, taken by many, is to use a word processor to generate the document, relying on styles or lightweight tagging in the word processor to guide an XML generator or transformer after-the-fact. This gives the drafter more flexibility, at the expense of the accuracy when the drafter deviates outside of the expected range of document patterns that the XML generator can handle.

I have often wondered if a web-based editor wouldn’t be a better approach, allowing us to achieve the flexibility the user needs with the rigidity that makes XML useful to downstream processing. In the end, the question has always been whether such an editor is even possible. When the Writely word processor (now Google Docs) came along in 2005, the answer seemed to be a tentative yes. Looking into it a little bit I discovered that while feasible, given all the browser incompatibilites of the time, achieving a worthwhile editor of the sophistication needed to draft and manage legislation would still be a daunting task. So the idea has always remained in the back of my mind waiting for browser technology to mature to the point where building such an editor comes within a shot of being a reasonable approach.

That point has now arrived. With HTML5, it is now possible to build a full fledged browser-based legislative editor. For the past few months I have been building a prototype legislative editor in HTML5 that uses Akoma Ntoso as its XML schema. The results have been most gratifying. Certainly, building such an editor is no easy task. Having been working in this subject for 10 years now I have all the issues well internalized and can navigate the difficulties that arise. But I have come a long way towards achieving the holy grail of legislative editors – a web-based, standards-based, browser-neutral solution.


There are a lot of challenges. Interacting with all the browser events to maintain an undo stack is no easy task. Building a change tracking mechanism from scratch gives me lots of freedom but getting your head around all the variations is bewildering at times. And how do you build something that is useful to multiple jurisdictions and is sufficiently flexible to adapt to all the varying needs? This takes a trick – but one I have thought long and hard about.

HTML5 is not yet a fully defined standard. None of the browsers fully support what is defined. The same can be said for CSS, JavaScript, XML Support, and on and on. But the editor I have built works with the latest version of all four major browsers. I don’t have a single CSS quirk at all. In fact, the only substantive browser incompatibility that I have had to deal with arises from the fact that Internet Explorer’s XML implementation pre-dates the standard in this area and the API does not yet match the full standards-based API. This is the case despite IE having been the pioneer in this area.

What’s more, I have achieved this with Akoma Ntoso as the internal XML schema. Akoma Ntoso is itself a proposed standard within OASIS. Certainly not everything has been smooth sailing and I have submitted a whole slew of issues to the drafters of Akoma Ntoso, but I have been able to find my way to a workable implementation. It works!

Our plan is to use this prototype for the Unhackathon at UC Hastings and Stanford Law School on May 19th and then in follow-on events elsewhere. In a few days I will provide a link to the prototype for everyone to try it out. It’s still early days and the editor is far from being a complete and robust tool suitable for production, but what I have now confirms my belief that the next generation of legislative systems will be web-based and built on open-standards.

In my next post I will provide a little mini-tutorial to set the stage for the upcoming pre-beta release.

Standard
Akoma Ntoso, Hackathon, Standards

Toward’s more Affordable Solutions in Legal Informatics – with Standards

Got six million dollars? That’s the ballpark figure for a complete legislative system and it is too much. A decade ago when the technologies were still new, the risks were high, and experience was scarce, the reasons for this were easily explained. But now it’s time to move beyond the early-adopter market towards a more mature, affordable, and predictable market for legislative systems. The key to moving to this new era is standards.

Towards this end I am participating as a member of the OASIS LegalDocML Technical Committee. Our charter is to develop a common standard for legal documents. We had our initial meeting in March and are defining our deliverables for this important mission.

The wide variety of legal systems and traditions ensures that there are never going to be off-the-shelf solutions in legislative systems. However, all the differences should not deter us from finding what we have in common. It is this commonality that provides the basis for the development of common applications that can be cost-effectively adapted to local requirements. Of course, to achieve this goal you need a common information model. The OASIS TC is using Akoma Ntoso as the starting point for this information model.

Akoma Ntoso Logo

Okay, so what is Akoma Ntoso? Akoma Ntoso is an XML schema developed at the University of Bologna and supported by Africa i-Parliaments, a project sponsored by United Nations Department of Economic and Social Affairs. It defines an XML-based information model for legislative documents. In the last few years it has been gaining traction in Europe, Africa, South America, and now slowly in North America as well.

Do you find the name Akoma Ntoso intimidating? Well, you’re not alone. However, it’s easy to say once you know how. Simply pronounce it as in “a-coma-in-tozo” and you’ll be close. Akoma Ntoso means “linked hearts” in the Akan language of West Africa.

If you find all the talk about XML confusing, then we have a solution for you. Through my company Xcential, I am working with Ari Hershowitz @arihersh of Tabulaw, Charles Belle @brainseedblog of UC Hastings, and Pieter Gunst @DigitalLawyer of Stanford and LawGives.org to host two “unhackathons” on May 19th at UC Hastings and at the Stanford Law School, both near San Francisco. The point of our unhackathons is to provide participants with an entry-level and fun introduction to the world of XML markup and Akoma Ntoso. And once we’re done with the May 19th events around San Francisco, we’re going to stage other events in other locations around the world to bring the same knowledge to a much wider audiance

If you would like to attend one of these events please sign up at the Eventbrite.com site and/or contact one of the organizers. And if you would like to host or participate in a virtual unhackathon in June, please let one of us know as well. We’re looking for volunteers.

Within the next week I will be posting a prototype of a new legislative editor for use at the unhackathon.  It’s an HTML5-based editor built entirely around Akoma Ntoso. While we don’t yet have any productization plans, this editor will demonstrate the coming era of cost-effective and standardized components which can be mixed and matched to produce the next generation of legal informatics solutions. Stay tuned…

Standard
Akoma Ntoso, Hackathon, Standards

International Open Standards Hackathon

An international open standard for legislative documents will be an important next step for making legislative information available to the people. An open standard will promote the creation of tools and services worldwide that will enable citizen participation in the legislative process and will enhance how governments make the laws they produce more transparent.

Today there is a great deal of inconsistency in how open participation and transparency are achieved around the world. Putting cultural and political differences aside, part of the reason for the inconsistency is the tremendous effort and cost involved in building the infrastructure to support these new objectives. An open standard will start to solve this problem by promoting the establishment of a real legal informatics industry of interoperable tools and services which can easily be put together to address whatever committment a government has made to open and transparent government.

Akoma Ntoso is an emerging standard the promises to do just this. It is an XML schema, developed by the University of Bologna in Italy. It was developed for Africa i-Parliaments, a project sponsored by United Nations Department of Economic and Social Affairs. In the coming weeks, an OASIS technical committee will begin the process of turning this standard into an international standard. I am a participant on that TC.

To further promote and to publicize Akoma Ntoso, I am working with Ari Hershowitz #arihersh to stage an international hackathon within the next few months. The idea is to provide an event for people that will demystify XML and Akoma Ntoso (yes, it is hard to say) by providing a really easy way for anyone to create a document using the proposed standard. Our goal will be to collect a world’s worth of legislative samples. This could be an important step towards building a library that stitches together all the world’s laws and regulations in an open and transparent way.

We’re currently seeking sponsors, participants, and venues for this hackathon. The interest we have found so far has been quite amazing. If you are interested in helping us make this event a success, please let either Ari or me know.

Standard
Akoma Ntoso, Standards

International Meeting on Transparency and the use of Open Document Standards

This past week was a very interesting week for me. I attended the meeting “Achieving Greater Transparency in Legislatures through the Use of Open Document Standards” at the U.S. House of Representatives in Washington DC. The meeting was sponsored by the United Nations, the Inter-Parliamentary Union, and the U.S. House of Representatives.

Meeting Participants

For me it was a valuable opportunity to meet with colleagues I had already met in my travels in recent years, to meet with people I knew of but had never met, and to meet people I had corresponded with only through email. Particularly special for me was to finally meet Tim Arnold-Moore. It was by reading his thesis on “Information Systems for Legislation” back in 2001 that I became aware of the field of legal informatics.

It was quite fascinating to see the different ways in which different countries were approaching transparency. As one would expect, the American approach is a bit heavy-handed with a focus on providing access to existing documentation. It seems that the real innovation is coming from smaller or younger countries that are less encumbered by the top-down bureaucracy that tends to squash out more cost-effective innovation.

Two systems caught my attention in particular:

  • The first was the system put in place by the Brazilian Chamber of Deputies. More than merely providing their citizens with visibility to the workings of their government, this system focused on ensuring two way interaction between citizens and their elected representatives – even going so far as to allow citizens to express their viewpoints by way of YouTube-style video clips.
  • The other system to get my attention, of course, was Bungeni. This is an open source Legislative Information System developed in Nairobi, Kenya for use in the parliaments of Africa (and elsewhere in the future). It is based on the Open Office word processor, but does the most credible job of turning a word processor into an XML editor that I have seen so far. Of course, it works with Akoma Ntoso which was developed alongside it.

Speaking of Akoma Ntoso, it came up plenty of times during the conference. Monica Palmirani and Fabio Vitali from the University of Bologna in Italy presented various aspects of the XML Schema. On Tuesday the OASIS LegalDocumentML TC was opened to drive it towards being an international standard. You can read about it here.

On the Thursday after the meeting wrapped up, Knowledge as Power held a class on Legislative XML at the National Democratic Institute. I presented the work I had done on Legix.info including the transform of the U.S. Code into Akoma Ntoso.

All in all, the four days I spent in Washington D.C. were very worthwhile. Hopefully the outcome of that meeting will be better cooperation in the field of legal informatics. Clearly, after a dozen or so years of experience with XML, the time has come to start moving beyond the tentative first steps which have defined this field towards open standards and the benefits that come when everyone works towards common goals.

Standard
Akoma Ntoso, Standards

And now for something completely different… Chinese!

Last week we saw how Akoma Ntoso can be applied to a very large consolidated Code – the United States Code. This week we take the challenge in a different direction – applying Akoma Ntoso to a bilingual implementation involving a totally different writing system. Our test document this week is the Hong Kong Basic Law. This document serves as the constitutional document of the Hong Kong Special Administrative Region of the People’s Republic of China. It was adopted on the 4 April 1990 and went into effect on July 1, 1997 when the United Kingdom handed over the region to the People’s Republic of China.

The Hong Kong Basic Law is available in English, Traditional Chinese, and Simplified Chinese. For our exercise, we are demonstrating the document in English and in Traditional Chinese. (Thank you to Patrick for doing the conversion for me.) Fortunately, using modern technologies, supporting Chinese characters alongside Latin characters is quite straightforward. Unicode provides a Hong Kong supplementary character set to handle characters unique to Hong Kong. The biggest challenge is ensuring that all the unicode declarations throughout the various XML and HTML files that the information must flow through are set correctly. With the number of accents we find in names in California as well as the rigorous nature of California’s publishing rules, getting Unicode right is something we have grown accustomed to.

While I hadn’t expected there to be any problems with Unicode, I was pleasently surprised to find that the fonts used in Legix simply worked with the Traditional Chinese characters without issue as well. (Well at least as far as I can tell without the ability to actually read Chinese)

The only issue we encountered was Internet Explorer’s support for CSS3. Apparently, IE still does not recognize “list-style-type” with a value of “cjk-ideographic”. So instead of getting Traditional Chinese numerals, we get Arabic numerals. The other browsers handled this much better.

So what other considerations were there? A big consideration was the referencing mechanism. To me, modeling how you refer to something in an information model can be more important than the information model itself. The referencing mechanism defines how the information is organized and allows you to address a specific piece of information in a very precise and accurate way. Done right, any piece of information can be accessed very quickly and easily. Done wrong and you get chaos.

Our referencing mechanism relies on the Functional Requirements for Bibliographical Records (FRBR). This mechanism is used by both SLIM and Akomantoso. Another interesting FRBR proposal for legislation can be found here.

FRBR defines an information model based on a hierarchical scheme of Work-Expression-Manifestion-Item. Think of the work as the overall document being addressed, the expression being the version desired, the manifestation the format you want to information presented in, and finally the item as a means for addressing a specific instance of the information. Typically we’re only concerend with Work-Expression-Manifestation.

For a bilingual or multilingual system, the “expression” part of the reference is used to specify which language you wish the document to be returned in. If you check out the references at Legix.info you will see that the two references the the Hong Kong Basic Law are:

The expressions are called out as “doc;en-uk” for the English version and “doc;zh-yue” for the Chinese version. Relatively straightforward. The manifestations are not shown and the result is the default manifestation of HTML.

Check the samples out and let me know what you think.

Standard
Akoma Ntoso, Process, Standards

Legislative Information Modeling

Last week I brought up the subject of semantic webs for legal documents. This week I want to expand the subject by discussing the technologies that I have encountered recently that point the way to a semantic web. Of course, there are the usual general purpose semantic web technologies like RDF, OWL, and SPARQL. Try as I might, I have been unable to get much practical interest out of anyone in these technologies. Part of the reason is that the abstraction they demand is just beyond most people’s grasp at this point in time. In academic circles it becomes easy to discuss these topics, but step into the “real world” and interest evaporates immediately.

Rather than pressing ahead with those technologies, I have chosen in recent years to step away and focus more on less abstract and more direct information modeling approaches. As I mentioned last week, I see two key areas of information modeling – the documents and the relationships between them. In some respects, there are three areas – distinguishing the metadata about the documents from the documents themselves. Typically I lump the documents with their metadata because much of the metadata gets included with the document text blurring the distinction and calling for a more uniform integrated model.

The projects I have worked on over the past decade have resulted in several legislative information models. With each project I have learned and evolved to result in the SLIM model found at the Legix.info demonstration website that exists today. Over time, a few key aspects have emerged as most important:

  • First and foremost has been the need for simplicity. It is very easy to get all caught up with the information model, discovering all the variations out there and finding clever solutions to each and every situation. However, it easily becomes possible to end up with a large and complex information model that you cannot teach to anyone that does not share your passion and experiences in information modeling. Your efforts to satisfy everyone result in a model that satisfies no one due to the resulting complexity of trying to please too many masters.
  • Secondly, you need to provide a way to build familiarity into your information model. While there are many consistently used terms in legislation, at the same time, traditions around the world do vary and sometimes very similar words have quite different meanings to different organizations. Trying to change long standing traditions to arrive at more consistent or abstract terminology always seems to be an uphill battle.
  • Thirdly, you have to consider the usage model. Is the model intended for downstream reporting and analysis or does the model need to work in an editing environment? An editing model could be quite different from a model intended only for downstream processing. The reason for this is that the manner in which the model will interact with the editor must be given important consideration. Two important aspects require consideration. First, the model must be robust yet flexible enough to work with all the intermediate states that a document will exist at whilst being edited. Second, change tracking is a very important consideration during the amendment process and how that function will be implemented in the document editor must be considered.

While I have developed SLIM and its associated reference scheme over the past few years, in the last year I have started experimenting with a few alternate models in the hopes of finding a more perfect model to solve the problem of legislative information modeling. Most recently I have started experimenting with Akoma Ntoso developed by Fabio Vitali and Monica Palmirani at the University of Bologna. This project is supported by Africa i-Parliaments, a project sponsored by United Nations Department of Economic and Social Affairs. I very much like this model as it follows many of the same ideals in terms of good information modeling that I try to conform to. In fact, it is quite similar to SLIM in many respects. The legix.info site has many examples of Akoma Ntoso documents, created by translating SLIM into Akoma Ntoso via an XSL Transform.

While I very much like Akoma Ntoso, I have yet to master it. It is a far more ambitious effort than SLIM, has many more tags, and covers a broader range of document types. Like SLIM, it covers both the metadata and the document text in a uniform model. I have yet to convince myself as to its viability as an editing schema. Adapting it to work with the editors I have worked with in the past is a project I just haven’t had the time for yet.

The other important aspect of a semantic web, as I wrote about last week is the referencing scheme. Akoma Ntoso uses a notation based on coded URLs to implement referencing. It is partly based on a conceptually similar model URN:LEX model based around URNs developed by Enrico Francesconi and Pierluigi Spinosa at the ITTIG/CNG in Florence, Italy. Both schemes build upon the Functional Requirement for Bibliographic Records (FRBR) model. I have tried adopting both models but have run into snags with the models either not covering enough types of relationships, scaring people away with too many special characters with encoded meaning, or resulting in too complex a location resolution model for my needs. At this point I have cherry picked the best features of both to try and arrive at a compromise that works for my cases. Hopefully I will be able to evolve towards a more consistent implementation as those efforts mature.

My next effort is to start taking a closer look at MetaLex, an open XML-based interchange format for legislation. It has been developed in Europe and defines a set of conventions for metadata, naming, cross references, and compound documents. Many projects in Europe including Akoma Ntoso comply with the Metalex framework. It will be interesting for me to see how easily I can adapt SLIM to Metalex. Hopefully the changes required will amount mostly to deriving from the Metalex schema and adapting to its attribute names. We shall see…

Standard
Process, Standards, W3C

What is a Semantic Web?

Tim Berners-Lee, inventor of the World Wide Web, defines a semantic web quite simply as “a web of data that can be processed directly and indirectly by machines“. In my experience, that simple definition quickly becomes confusing as people add their own wants and desires to the definition. There are technologies like RDF, OWL, and SPARQL that are considered key components of semantic web technology. It seems though that these technologies add so much confusion through abstraction that non-academic people quickly steer as far away from the notion of a semantic web as they can get.

So let’s stick to the simple definition from Tim Berners-Lee. We will simply distinguish the semantic web from our existing web by saying that a semantic web is designed to be meaningful to machines as well as to people. So what does it mean for a web of information to be meaningful to machines? A simple answer is to say that there are two primary things that a machine needs to understand about a web. First of all, what the pages are all about, and secondly what the relationships that connect the pages together are all about.

It turns out that making a machine capable of understanding even the most rudimentary aspects of pages and the links that connect them is quite challenging. Generally, you have to resort to fragile custom-built parsers or sophisticated algorithms that analyze the document pages and the references between them. Going from pages with lots of words connected somehow to other pages to a meaningful information model is quite a chore.

What we need to improve the situation are agreed upon information formats and referencing schemes in a semantic web that can more readily be interpreted by machines. Defining what those formats and schemes are is where the subject of semantic webs starts getting thorny. Before trying to tackle all of this, let’s first consider how this all applies to us.

What could benefit more from a semantic web than legal publishing? Understanding the law is a very complex subject which requires extensive analysis and know-how. This problem could be simplified substantially using a semantic web. Legal documents are an ideal fit to the notion of a semantic web. First of all, the documents are quite structured. Even though each jurisdiction might have their own presentation styles and procedural traditions, the underlying models are all quite similar around the world. Secondly, legal documents are rich with relationships or citations to other documents. Understanding these relationships and what they mean is quite important to understanding the meaning of the documents.

So let’s consider the current state of legal publishing – and from my perspective – legislative publishing. The good news is that the information is almost universally available online in a free and easily accessed format. We are, after all, subject to the law and providing access to that law is the duty of the people that make the laws. However, providing readable access to the documents is often the only objective and any which way of accomplishing that objective is simply the requirement. Documents are often published as PDFs which are nice to read, but really difficult for computers to understand. There is no uniformity between jurisdictions, minimal analysis capability (typically word search), and links connecting references and citations between documents are most often missing. This is a less than ideal situation.

We live in an era where our legal institutions are expected to provide more transparency into their functions. At the same time, we expect more from computers than merely allowing us to read documents online. It is becoming more and more important to have machines interpret and analyze the information within documents – and without error. Today, if you want to provide useful access to legal information by providing value-added analysis capabilities, you must first tackle the task of interpreting all the variations in which laws are published online. This is a monumental task which then subjects you to a barrage of changes as the manner in which the documents are released to the public evolves.

So what if there was a uniform semantic web for legal documents? What standards would be required? What services would be required? Would we need to have uniform standards or could existing fragmented standards be accommodated? Would it all need to come from a single provider, from a group of cooperating providers, or would there be a looser way to federate all the documents being provided by all the sources of law around the world? Should the legal entities that are sources of law assume responsibility for publishing legal documents or should this be left to third party providers? In my coming posts I want to explore these questions.

Standard