Process, Standards, W3C

What is a Semantic Web?

Tim Berners-Lee, inventor of the World Wide Web, defines a semantic web quite simply as “a web of data that can be processed directly and indirectly by machines“. In my experience, that simple definition quickly becomes confusing as people add their own wants and desires to the definition. There are technologies like RDF, OWL, and SPARQL that are considered key components of semantic web technology. It seems though that these technologies add so much confusion through abstraction that non-academic people quickly steer as far away from the notion of a semantic web as they can get.

So let’s stick to the simple definition from Tim Berners-Lee. We will simply distinguish the semantic web from our existing web by saying that a semantic web is designed to be meaningful to machines as well as to people. So what does it mean for a web of information to be meaningful to machines? A simple answer is to say that there are two primary things that a machine needs to understand about a web. First of all, what the pages are all about, and secondly what the relationships that connect the pages together are all about.

It turns out that making a machine capable of understanding even the most rudimentary aspects of pages and the links that connect them is quite challenging. Generally, you have to resort to fragile custom-built parsers or sophisticated algorithms that analyze the document pages and the references between them. Going from pages with lots of words connected somehow to other pages to a meaningful information model is quite a chore.

What we need to improve the situation are agreed upon information formats and referencing schemes in a semantic web that can more readily be interpreted by machines. Defining what those formats and schemes are is where the subject of semantic webs starts getting thorny. Before trying to tackle all of this, let’s first consider how this all applies to us.

What could benefit more from a semantic web than legal publishing? Understanding the law is a very complex subject which requires extensive analysis and know-how. This problem could be simplified substantially using a semantic web. Legal documents are an ideal fit to the notion of a semantic web. First of all, the documents are quite structured. Even though each jurisdiction might have their own presentation styles and procedural traditions, the underlying models are all quite similar around the world. Secondly, legal documents are rich with relationships or citations to other documents. Understanding these relationships and what they mean is quite important to understanding the meaning of the documents.

So let’s consider the current state of legal publishing – and from my perspective – legislative publishing. The good news is that the information is almost universally available online in a free and easily accessed format. We are, after all, subject to the law and providing access to that law is the duty of the people that make the laws. However, providing readable access to the documents is often the only objective and any which way of accomplishing that objective is simply the requirement. Documents are often published as PDFs which are nice to read, but really difficult for computers to understand. There is no uniformity between jurisdictions, minimal analysis capability (typically word search), and links connecting references and citations between documents are most often missing. This is a less than ideal situation.

We live in an era where our legal institutions are expected to provide more transparency into their functions. At the same time, we expect more from computers than merely allowing us to read documents online. It is becoming more and more important to have machines interpret and analyze the information within documents – and without error. Today, if you want to provide useful access to legal information by providing value-added analysis capabilities, you must first tackle the task of interpreting all the variations in which laws are published online. This is a monumental task which then subjects you to a barrage of changes as the manner in which the documents are released to the public evolves.

So what if there was a uniform semantic web for legal documents? What standards would be required? What services would be required? Would we need to have uniform standards or could existing fragmented standards be accommodated? Would it all need to come from a single provider, from a group of cooperating providers, or would there be a looser way to federate all the documents being provided by all the sources of law around the world? Should the legal entities that are sources of law assume responsibility for publishing legal documents or should this be left to third party providers? In my coming posts I want to explore these questions.

Standard
Standards

To go Open Source or Not?

It is my dream to establish a legal informatics industry. Today, legal informatics is conducted either as an internal function or by consulting firms that specialize in long multi-year projects to build custom solutions. The few commercial products that exist are in the form of proprietory products or web services. Compared to many other forms of informatics, legal informatics has evolved very slowly. Part of the reason for this, of course, is the specialized nature of this field. This is particularly the case with legislative information where each legislature or parliament has long established traditions that are difficult to change.

As with every other informatics field, an industry will be established eventually. The costs of custom built software are simply economicially impractical in many cases, demanding a re-think about how solutions are created. For me, a key part of establishing that industry is the creation of standards. Whether there are official standards or de facto standards, standards will spur on the creation of an industry by creating a common model upon which to build. I have seen this happen in other industries that I have participated in and I don’t see why legal informatics should be any different. Yes, legal informatics is tardy in this regard, but that slowness should not discourage us from making it happen now.

So the question is quite simple. Can an open source solution form the basis of a de facto standard for legal informatics? And if so, what does that solution need to consist of? That is the question we have been wrestling with. There are two sides to this argument. While we might want to promote the establishment of an industry, at the same time we need to provide an economic incentive that will encourage businesses to participate. Could providing too much of an open source solution merely enable existing players to be more efficient, yet continue to work in relative isolation? It seems that a better outcome would be to promote the creation of interoperable products that can be mixed or matched to solve the multitude of needs in this field.

To this end, we’re trying a two pronged approach. First, we are fully supporting the establishment of official standards through bodies such as OASIS. Secondly, understanding that the official standard route is going to be a slow and perhaps arduous process, we’re pushing for the establishment of de facto standards. To this second goal, we have open-sourced our own SLIM model for legislation. It’s a very simple XML model based on 10 years experience building these types of models. While it isn’t a be-all and end-all solution, it is consistent with the current XML thinking and is quite easy to adopt. I have spent a fair amount of time wrestling with how to release it as an open source package recently. There are two questions I have:

  1. Which model? There are so many to choose from: Creative Commons, GNU, BSD, etc. Which model is permissive enough without discouraging commercial entities from adopting it.
  2. What aspects should be open source? I think it is quite clear that any and all XML information models should be open. That is in the spirit of XML, is consistent with the public domain nature of legislative information, and will allow the data to be accurately interpreted long after any particular software application of company has run its course. But would providing foundational software packages that are also open source further encourage the adoption of the model? And if that is the case, what foundation software would be beneficial?

At this point we have answered the first question and the first part of the second. We have chosen to release the SLIM XML schema as open source by using a Creative Commons Attribution-ShareAlike license. The rest of the second question remains open. What else we should provide with an open source license? Certainly it cannot be our full software suite. We are a commercial business and we need to make a living. But parts of our packages could be released to promote the adoption of SLIM as a de facto standard of sorts. What do you think?

Standard
Process, Standards

Welcome to my new blog on Legal Informatics

Imagine that all the world’s laws are published electronically in an open and consistent manner. Imagine that you or your business can easily research the laws to which you are subject. Imagine an industry that caters to the needs of the legal profession based on open worldwide standards.

Of course, there are many reasons why this is just not possible. Every legislature or parliament has their own way of doing things. Every country has their own unique legal system.  Every jurisdiction has their own unique traditions. It simply isn’t possible that all these unique requirements could be harmonized to achieve that vision. Of course not… But it will happen. It might take 50 years, but eventually it will happen. We can debate endlessly why it won’t. We can argue over nuances that get in the way forever. That’s not why I am writing this blog.

I want to open the discussion to how it might happen. What steps can we can start taking right now that will lead us towards our eventual goal? We live in an era where there is widespread dissatisfaction with the way our governments pass laws. There are constant calls for better transparency into the workings of the legislative process. The dissatisfaction we all feel has created an opportunity for entrepreneurial startups. Their goals are most often to affect change in government. For those of us with existing experience in this field, how can we harness our knowledge and work with these emerging efforts to achieve a greater good for us all?

I’ve spent the past ten years in this field, working as a consultant and developer primarily to the State of California. See my About for more about me. Now, with that experience to draw upon, I am hoping to make this blog a useful tool to others that might learn from my past. I’m going to make this blog a regular part of my life – posting regularly, maybe weekly. With each post I want to raise a number of questions and open up thoughtful discussions. Some of the topics I have in mind:

  • How do we balance openness and transparency with business opportunity?
  • Do we need open standards? If not now, when?
  • When it comes to openness and transparency, what is the government’s responsibility?
  • Are there technologies we need to focus on?
  • Isn’t this a Semantic Web for law? What does that mean anyway?
  • And from time to time I will share some of the questions I get each week about how to model legislation in XML. I’ll try not to get bogged down in technical minutiae.

What else? Please leave me a comment with your suggestions. Rather than just being a blog, I would like to see this grow into more of a conversation about how legal informatics can be applied to achieve a truly beneficial semantic web for law.

What could your role be in all this? Are you a government agency, a not-for-profit, a fledgling startup, a publishing company, or even a technology supplier or consultant like myself? Regardless of who you are, I am asking for your participation in this blog. Together we can shape the future of how legal information is shared around the world.

So let’s get started… My next post will start with a question I have been wrestling with lately – How can we heed the call for better open source data without hindering the for-profit motive that will foster an industry?

Standard