In my blog last week, I talked a little about our efforts to improve how citations are handled. This week, I want to talk about this in some more detail. I’ve been participating on a few projects to improve how citations and references to legal citations are handled.
Let’s start by looking at the need. Have you noticed how difficult it is to lookup many citations found in legislation published on the web? Quite often, there is no link associated with the citation. You’re left to do your own legwork if you want to lookup that citation – which probably means you’ll take the author’s word for it and not bother to follow the citation. Sometimes, if you’re lucky, you will find a link (or reference) associated with the citation. It will point to a location, chosen by the author, that contains a copy of the legal text being referenced.
What’s the problem with these references?
- If you take a look at the reference, chances are it’s a crufty URL containing all sorts of gibberish that’s either difficult or impossible to interpret. The URL reflects the current implementation of the data provider. It’s not intended to be meaningful. It follows no common conventions for how to describe a legal provision.
- Wait a few years and try and follow that link again. Chances are, that link will now be broken. The data provider may have redesigned their site or it might not even exist anymore. You’re left with a meaningless link that points to nowhere.
- Even if the data does exist, what’s the quality of the data at the other end of the link. Is the text official text, a copy, or even a derivative of the official text? Has the provision been amended? Has it been renumbered? Has it been repealed? What version of that provision are you looking at now? These questions are all hard to answer.
- When you retrieve the data, what format does it come in? Is it a PDF? What if you want the underlying XML? If that is available, how do you get it?
The object of our efforts, both at the standards committee and within the projects we’re working on at Xcential, is to tackle this problem. The approach being taken involves properly designing meaningful URLs which are descriptive, unambiguous, and can last for a very long time – perhaps decades or longer. These URLs are independent of the current implementation – they may not reflect how the data is stored at all. The job of figuring out how to retrieve the data, using the current underlying content management system, is the job of a “resolver”. A resolver is simply an adapter that is attached to a web server. It intercepts the properly designed URL references and then transparently maps them into the crufty old URLs which the content management system requires. The data is retrieved from the content management system, formatted appropriately, and returned as if it really did exist at the property designed URL which you see. As the years go by and technologies advance, the resolver can be adapted to handle new generations of content management system. The references themselves will never need to change.
There are many more details to go into. I’ll leave those for future blogs. Some of the problems we are tackling involve mapping popular names into real citations, working through ambiguities (including ones created in the future), handling alternate data sources, and allowing citations to be retrieved at varying degrees of granularity.
I believe that solving the legal references problem is just about the most important progress we can make towards improving the legal informatics field. It’s an exciting time to be working in this field.
2 thoughts on “Improving Legal References”
This is a great sounding solution. It was always painful to hear why web systems did not or could not allow the URLs be controlled by the publisher. And still fragment URLs (id attribute based) are not the norm. By building URLs that are well constructed for dereferencing the citation even if the link fails, we have a solution that works for many things.
For example, even as a dead link and before dereferencing to search based on the components, the link serves as a search term (especially for systems with advanced searches for a href links). Where URNs are not native to HTML (save namespacing), URLs are the lifeblood of the web and HTML. Also, these URLs even dead will be crucial as metadata/data since they are universally unique strings. Design a health care system that must comply with the law, for example, HIPAA, and just add a field for HIPAA to each EMR (electronic medical record) and then any data dump will include the machine readable citation that allows any reuser of the data to know how the information can legally be used even if the reuser is unfamiliar with the term HIPAA or the law. And auditors can build systems to make sure software systems and data is being properly handled.
Of course it would be great if all of the citations were live links that took a person to the section that is being cited. Using id attributes the citation can now reference a block of the document, not just a point in the document. This also means that the electronic citation is also a machinable way to transfer the full text or HTML of the cited text.
Thanks Grant for your fantastic efforts in this regard.
Pingback: Imagining Government Data in the 21st Century « Legix.info