Akoma Ntoso, Standards

And now for something completely different… Chinese!

Last week we saw how Akoma Ntoso can be applied to a very large consolidated Code – the United States Code. This week we take the challenge in a different direction – applying Akoma Ntoso to a bilingual implementation involving a totally different writing system. Our test document this week is the Hong Kong Basic Law. This document serves as the constitutional document of the Hong Kong Special Administrative Region of the People’s Republic of China. It was adopted on the 4 April 1990 and went into effect on July 1, 1997 when the United Kingdom handed over the region to the People’s Republic of China.

The Hong Kong Basic Law is available in English, Traditional Chinese, and Simplified Chinese. For our exercise, we are demonstrating the document in English and in Traditional Chinese. (Thank you to Patrick for doing the conversion for me.) Fortunately, using modern technologies, supporting Chinese characters alongside Latin characters is quite straightforward. Unicode provides a Hong Kong supplementary character set to handle characters unique to Hong Kong. The biggest challenge is ensuring that all the unicode declarations throughout the various XML and HTML files that the information must flow through are set correctly. With the number of accents we find in names in California as well as the rigorous nature of California’s publishing rules, getting Unicode right is something we have grown accustomed to.

While I hadn’t expected there to be any problems with Unicode, I was pleasently surprised to find that the fonts used in Legix simply worked with the Traditional Chinese characters without issue as well. (Well at least as far as I can tell without the ability to actually read Chinese)

The only issue we encountered was Internet Explorer’s support for CSS3. Apparently, IE still does not recognize “list-style-type” with a value of “cjk-ideographic”. So instead of getting Traditional Chinese numerals, we get Arabic numerals. The other browsers handled this much better.

So what other considerations were there? A big consideration was the referencing mechanism. To me, modeling how you refer to something in an information model can be more important than the information model itself. The referencing mechanism defines how the information is organized and allows you to address a specific piece of information in a very precise and accurate way. Done right, any piece of information can be accessed very quickly and easily. Done wrong and you get chaos.

Our referencing mechanism relies on the Functional Requirements for Bibliographical Records (FRBR). This mechanism is used by both SLIM and Akomantoso. Another interesting FRBR proposal for legislation can be found here.

FRBR defines an information model based on a hierarchical scheme of Work-Expression-Manifestion-Item. Think of the work as the overall document being addressed, the expression being the version desired, the manifestation the format you want to information presented in, and finally the item as a means for addressing a specific instance of the information. Typically we’re only concerend with Work-Expression-Manifestation.

For a bilingual or multilingual system, the “expression” part of the reference is used to specify which language you wish the document to be returned in. If you check out the references at Legix.info you will see that the two references the the Hong Kong Basic Law are:

The expressions are called out as “doc;en-uk” for the English version and “doc;zh-yue” for the Chinese version. Relatively straightforward. The manifestations are not shown and the result is the default manifestation of HTML.

Check the samples out and let me know what you think.

Standard
Akoma Ntoso

Applying Akoma Ntoso to the United States Code

A few weeks ago the U.S. House of Representative’s Committee on House Administration held a one day Legislative Data and Transparency Conference. While I was not able to attend in person, I did listen in to the presentation via the live stream that was provided.

Of all the things I learned that day, one specific detail intrigued me the most – that there is an XML representation of the United States Code that has been made available. This XML is available at http://uscode.house.gov/xml. While the data is a little stale and some titles are mysteriously absent (Title 14, the repealed Title 34, and Title 51), it is a great source to begin experimenting with the United States Code.

One question asked by Sarah Schacht of Knowledge As Power was why there wasn’t very much interest in Akoma Ntoso at the federal level. For me, the answer wasn’t altogether satisfying but it did gave me an idea! How about I try to transform the XML files that are available into Akoma Ntoso as best I know how. That way, I could learn for myself how well Akoma Ntoso adapts to the needs of the US federal government. Admittedly the US Code is only one aspect of the overall issue, but it is a reasonable place to start.

The effort took me just a few days and now I have (almost) the full United States Code available in Akoma Ntoso. You can find it on my Legix.info site under United States Laws. Click on the “AKN” link in the upper right of each file to see tha Akoma Ntoso rendition. As a bonus, I also updated the United States Constitution that we had already prototyped to use the latest transforms for federal data. With thanks to Monica Palmirani at the University of Bologna and Flavio Zeni at UNDESA for their help, I was able to get what I think is a fairly reasonable rendition of the United States Code in Akoma Ntoso. Additionally, I have transforms into the SLIM formats and into an HTML presentation format.

So what have I learned? First of all, Akoma Ntoso adapted quite easily to the hierarchical model of the United States Code. That isn’t too surprising as the U.S. Codes hierarchy isn’t unusual and Akoma Ntoso is quite flexible in this regard. However, I do have an issue with managing a document as large as the United States Code. From what I can tell, the component mechanism within Akoma Ntoso simply doesn’t adapt to modeling a very large code. I need some sort of composition or inclusion mechanism that would allow the single US Code to be modeled as a composite document made up of many files, preferably in some sort of hierarchical arrangement. Currently I have modeled the US Code as 48 different “Acts” corresponding to the available titles within US Code, but this is far from ideal. Modeling the individual titles as acts is not accurate and still does not resolve the scalability issues but is the best I cold figure out at this time. In the past I found Monica and her team to be quite responsive to issues such as this so hopefully we will have a quick resolution to this. Maybe I simply don’t know enough about Akoma Ntoso to model a large document adequately.

My effort is just a start. I still have lots to learn about how best to apply Akoma Ntoso in various contexts. I will be refining my transforms as time allows in the weeks to come. Take a look at what I have done and let me know what you think. I welcome all feedback, both constructive and otherwise. My intent in publishing all the experiments and research that we do at Xcential is to share what we know with the legal informatics community in the hopes of fostering a more collaborative spirit amongst us all. So please send me your comments!

Standard