
Wouldn’t the National information exchange be better served by deferring the National Information Exchange Model (NIEM) and instead implementing some sort of Google-like search of federal, state, and municipal text data records. Most federal, state and local data resides in sophisticated databases using their information management tools but such tools all seem to support ways to create a PDF, DOC, or other text output for their information records. Once in text form, such data could easily be indexed by Google or other search engines, and thus, searched by any term in the text record.
Now this could never completely replace NIEM, e.g., it could never offer even “close-to” real-time information sharing. But true real-time sharing would be impossible even with NIEM. And whereas NIEM is still under discussion today (years after its initial draft) and will no doubt require even more time to fully implement, text based search could be available today with minimal cost and effort.
What would be missing from a text based search scheme vs. NIEM:
- “Near” realtime sharing of information
- Security constraints on information being shared
- Contextual information surrounding data records,
- Semantic information explaining data fields
Text based information sharing in operation
How would something like a Google type text search work to share government information. As discussed above government information management tools would need to convert data records into text. This could be a PDF, text file, DOC file, PPT, and more formats could be supported in the future.
Once text versions of data records were available, it would need to be uploaded to a (federally hosted) special website where a search engine could scan and index it. Indexing such a repository would be no more complex than doing the same for the web today. Even so it will take time to scan and index the data. Until this is done, searching the data will not be available. However, Google and others can scan web pages in seconds and often scan websites daily so the delay may be as little as minutes to days after data upload.
Securing text based search data
Search security could be accomplished in any number of ways, e.g., with different levels of websites or directories established at each security level. Assuming one used different websites then Google or another search engine could be directed to search any security level site at your level and below for information you requested. This may take some effort to implement but even today one can restrict a Google search to a set of websites. It’s conceivable that some script could be developed to invoke a search request based on your security level to restrict search results.
Gaining participation
Once the upload websites/repositories are up and running, getting federal, state and local government to place data into those repositories may take some persuasion. Federal funding can be used as one means to enforce compliance. Bootstrapping data loading into the searchable repository can help insure initial usage and once that is established hopefully, ease of access and search effectiveness, can help insure it’s continued use.
Interim path to NIEM
One loses all contextual and most semantic information when converting a database record into text format but that can’t be helped. What one gains by doing this is an almost immediate searchable repository of information.
For example, Google can be licensed to operate on internal sites for a fair but high fee and we’re sure Microsoft is willing to do the same for Bing/Fast. Setting up a website to do the uploads can take an hour or so by using something like WordPress and file management plugins like FileBase but other alternatives exist.
Would this support the traffic for the entire nation’s information repository, probably not. However, it would be an quick and easy proof of concept which could go a long way to getting information exchange started. Nonetheless, I wouldn’t underestimate the speed and efficiency of WordPress as it supports a number of highly active websites/blogs. Over time such a WordPress website could be optimized, if necessary, to support even higher performance.
As this takes off, perhaps the need for NIEM becomes less time sensitive and will allow it to take a more reasoned approach. Also as the web and search engines start to become more semantically aware perhaps the need for NIEM becomes less so. Even so, there may ultimately need to be something like NIEM to facilitate increased security, real-time search, database context and semantics.
In the mean time, a more primitive textual search mechanism such as described above could be up and available for download within a day or so. True, it wouldn’t provide real time search, wouldn’t provide everything NIEM could do, but it could provide viable, actionable information exchange today.
I am probably over simplifying the complexity to provide true information sharing but such a capability could go a long way to help integrate governmental information sharing needed to support national security.
Your statement that NIEM is still under discussion is not correct. NIEM has been and is being used to facilitate information sharing all over the country. Texas builtt its "Texas path to NIEM" converting some 26 different core justice exchanges to NIEM conformant exchanges. The FBI has implemented its National Data Exchange Program based on NIEM with over 60 million incident reports already on-line in a consolidated, searchable database accessible to all law enforcement agencies in the country.
Converting all records to text would make it difficult to provide the protections for acess to specific data elements that can be provided in a structured data base to protect privacy and civil liberties in accordance with the varying state laws governing dissemination. For the contructive protection of constitutional liberties, we need to carefully construct the capacity for federated queries to ensure that the protections that the originator of the data intends are maintained as the data crosses state lines,
Paul,
Thanks for the correction, my mistake that it's not being used today.
However, I am a bit curious as to how data element protection differs from data record protection. Are you talking about restricting portions of a data record from general access? And this would differ from one field to another and potentially be different than record access? If this is a necessity then textualizing the records would never work. I would counter by asking which state laws define such "constructive protection" and should this differ from state to state within the union?
Ray
There is definitely room for both approaches for information sharing: google and search provides a great tool for finding unstructured content. NIEM provides data layer interoperability across many databases, thereby reducing the number of point to point connections.
What is really powerful is when you have some NIEM structured content that can be used to analyze relationships linked to unstructured document type data. Together, NIEM and Google are tools for current information sharing sets.
Donna, I like your thinking. Neither text based search nor NIEM need exist in isolation and together they can even provide better information sharing.
Ray
The biggest thing that I would think would be missing from a Google search is the difference between searching based on syntax (as Google works now, for the most part) and the real business need to search based on predefined semantics. Having a fairly common last name means that I would not want information exchanged between agencies based on syntax. Semantic technologies are making progress on this front, but even well-funded projects still struggle with pulling context out of masses of data. Knowing that "the Chairman" might mean Mao, might mean the CEO of a company, or might mean Frank Sinatra isn't something that Google (or any other search engine currently does with any sense of accuracy.
Actually, Google has started doing that (for that matter even Yahoo). Google introduced a feature called Rich Snippets. They have been playing around with microformats and RDFa. They have released an official version last year. As far as I know they only support two types of objects at this point; people and reviews. As long as you can mark-up your web pages with RDFa, the search results could become more meaningful.
Here is a blurb from Google how useful the mark-up would be:
"Imagine that you have a review of a restaurant on your page. In your HTML, you show the name of the restaurant, the address and phone number, the number of users who have provided reviews, and the average rating. People can read and understand this information, but to a computer it is nothing but strings of unstructured text. With microformats or RDFa, you can label each piece of text to make it clear that it represents a certain type of data: for example, a restaurant name, an address, or a rating. This is done by providing additional HTML tags that computers understand. These don’t affect the appearance of your pages, but Google and any other services that look at the HTML can use the tags to better understand your information, and display it in useful ways—for example, in search results"
Srini,
Great comment. But the one thing I would emphasize is that once you move from special purpose tools such as NIEM to something with broader applicability one can take advantage of any advancement made in the broader tool. I was speaking more from just basic search accuracy which Google and their brethren are probably spending millions of dollars on each year to broadening the search domain to different file types, voice, images and video. Any of these capabilities can make for a better search result leading to better information sharing. The fact that semantic web capabilities are starting to also be realized just proves my point.
Ray
Boy, this is a TOTAL Apples to Oranges comparison… Have you ever even seen a NIEM Message?
Would you somehow suggest that a business stop sending purchase orders to buy things from suppliers?
NIEM is about transactional Messaging. Google is about Search. Apples. Oranges.
Mike,I thought the purpose of NIEM was to share information across domains so as to provide a broader base for decision making. But to tell the truth no I have never seen a NIEM message. And NO I would not suggest a business stop sending purchase orders to buy things. BUT if I was interested in aggregating purchase orders to 1000 of municipalities and local governments, and 50 states and 100s of national government agencies I might consider asking for text copies of all of them so that they could be easily searched.Other comments have come in describing the semantic nature of NIEM information and I agree much of that will be lost from a purely text based search but what I was trying to say is that if you are interested in quickening the sharing of information across disparate organizations then the best way from my perspective is text based search.Ray
Ray: The premise of this article– suggesting text-based searching as an alternative to NIEM — is faulty. As Mike says, it's exactly like comparing apples to oranges. Your article assumes NIEM is about searching information, and it's really not. It's about sharing information- NIEM is a standardized vocabulary to facilitate the development of information exchange standards where those standards enable sharing of information among disparate systems.
I'm not going to comment on whether your idea of a "(federally hosted) special website" is a good or even viable idea, but it's well beyond the scope of NIEM. NIEM is not a centralized system. The NIEM model recognizes and supports the importance of system distribution and autonomy.
There's a lot of good information on the niem.gov website that could help you better understand NIEM (the program and NIEM the model).
Andrew,
Thanks for your comments and I do appreciate them. You are right, I had not completely thought out all the implications of a \”(federally hosted) special website\”. But I would say that the reason for the special website was to better secure this information and limit who could search through it. But as you say this is well beyond the scope of NIEM.
However, it's hard for me to see the distinction you, Mike and others make between \”sharin\” and \”searching\” information. I look at the web and see it as one massive undertaking in sharing information that is mostly facilitated by extensive search. The web prior to search was mostly shared via directories and emailing links to one another. All this is still done of course but search is so much more effective in allowing self-directed research into a topic area.
I have trouble seeing NIEM sharing information capabilities that couldn't be done better, and quicker via text based search engines. Granted securing access to such information by a single website represents perhaps difficult jurisdictional issues. However, what I was trying to propose is that there is a solution, readily doable today Such an approach doesn't involve significant debate, significant development or a lengthy implementation. I understand that such an approach eliminates semantic information, minimizes information firewalls between jurisdictions, and perhaps compromises data security. But such problems can all be approached with less time and with considerably less expense than what NIEM represents in my mind.
Ray
Ray: please consider the following scenario to hopefully clarify your confusion about “information sharing”:
A law enforcement agency has the responsibility of issuing citations. To manage these citations, they have their own electronic citation system where they enter and maintain all citation information. Other agencies’ business processes could be affected by the citations issued by the law enforcement agency. Because of this, the law enforcement agency will need to electronically share the citations with agencies such as courts, probation, etc. So there may be a requirement that the court initiates a case every time the law enforcement agency issues a citation. To make this happen, the court’s case management system would receive the citation from the law enforcement agency’s citation system. Furthermore, in order for the court to initiate a citation case, they need to receive specific information from the law enforcement agency. This is where NIEM comes into play. NIEM is used to specifically define the content that must be “shared” between the two systems in order to support their business processes. This way, every time a citation is sent from the law enforcement citation system to the courts case management system, it is in the same exact format and consists of the same semantics. This is “information sharing”, as you can see it is much different than just “searching”.
Consider a similar scenario for probation. Perhaps probation wants to be made aware of Citations in case a citation violates the condition of someone’s probation. The same NIEM-based exchange standard could be used to define the sharing of information between the law enforcement agency citation systems and probation’s case management system.
Of course I have breezed over a lot of the technical detail, but I hope this solidifies the point.
Andrew,
Again thanks for the clarification. Admittedly, I have not spent as much time on the NIEM.gov website to see that this was its intention. On my cursory reading I thought this had something to do with intelligence agencies trying to access information that's buried in proprietary database systems that are maintained outside the federal government.
As you explain it, now I understand NIEMs purpose as to facilitate the development of inter-related systems that need triggering events and information to provide their portion of some more extensive cross-system transaction. If such is the case, then I FULLY AGREE with what you have been saying and emphatically state that text search is NOT the solution to this problem.
However, the mandate on the home page of niem.gov states:
It is designed to develop, disseminate and support enterprise-wide information exchange standards and processes that can enable jurisdictions to effectively share critical information in emergency situations, as well as support the day-to-day operations of agencies throughout the nation.
I understand how day-to-day operations are facilitated by NIEM as defined, but fail to see how such information sharing supports the more generic, free ranging purposes required to deal with emergency situations. This is where text based search can be very effective.
So maybe this turns us back to the comment made earlier that a combination of NIEM as defined and something like text based search may be what's really needed in order to honor NIEMs designated purpose.
Ray
So to recap, you didn't really understand NIEM, but offered a public opinion on its value compared to Google. Hmmm. Not well thought out or researched blog. Won't get bookmarked by me. Second, good for NIEM getting compared to the industry giant. They must be doing something right.
Donna:You are a bit harsh but perhaps I deserve it. However, I still think that NIEM is not the complete answer even after all the discussion and stand by that.
Ray
Ray: I, and I think most others, would agree that NIEM is not the complete answer to information sharing, nor does it intend to be. There are several other pieces of the puzzle. NIEM (a standardized vocabulary) is the critical piece of that puzzle that makes semantic interoperability possible.
Andrew: I am in complete agreement with you there
Ray
Ray, this highlights the expertise gap between those with storage backgrounds and those in information management.
I've been seeing this a lot on storage-related blogs over the past couple of years. The recent Wikibon article about deduplication is a great example. Upon reading it I could immediately tell someone with a storage background wrote it…the author wasn't aware that "deduplication" existed outside storage long before the storage industry adopted and improved upon it.
We were involved in the first collaborative project between SNIA and ARMA in 2005 primarily because of our expertise in both fields – the goal was to discuss convergence between storage and information management. Months of discussion. Very little agreement. Lots of superficial compromise. And 5 years later, little or no progress.
Frankly, I'm disappointed by both communities. Neither side is really taking the time to genuinely understand the other. Sure there are small pockets of progress, but that's about it. Most information managers are still under the mistaken impression that they actually control their information assets. They have no idea about the depth and breadth of the tentacles of storage. And at least some storage vendors and practitioners seem to believe what they do is adequate information management.
You're really going to need to do your homework if you want to extend your coverage beyond storage topics into information management. Likewise, I would recommend that those outside storage take the time to understand some of the basics of storage technology and its enormous impact on the handling and protection of information assets.
Joseph: Thanks for your comments. Although I place myself firmly in the storage analyst side of this discussion, my experience is much wider than just storage. I find it interesting that something as simple as sharing information can be deemed so difficult to accomplish.
I think it was Shunru Suzuki who said \”In the beginner's mind there are many possibilities, but in the experts mind there are few.\” Sometimes, one has to step back from a process and take a look at what the original intent of the process was and see if there just isn't an easier way to accomplish this.
All that being said, NIEM has a place to play as an inter-system transport for transactional information. I just think that augmenting that with something akin to textual search can provide a depth of information sharing that inter-system transport can never accomplish alone.
Ray
I'm unclear about what you're proposing Ray. Perhaps you're still confused or I have misunderstood what you wrote.
The NIEM framework need not be "augmented with search". That is to say, NIEM has nothing to do with search…nada, zip. Two totally different discussions. As Andrew pointed out earlier, NIEM simply ensures semantic consistency for the purpose of information exchange.
When I think NIEM I think of other exchange frameworks such as NITF (news publishing) and HL7 (healthcare). Loosely speaking, not unlike storage standards such as SMI-S and XAM.
Newspaper publishers can run searches against their content repositories, and use NITF (the news industry's equivalent of NIEM) to then share the information between publishers. NITF ensures their systems "speak the same language". Similarly, healthcare organizations can search their repositories and use HL7 to then exchange the information.
NIEM simply ensures that government agencies' and suppliers' systems can exchange content in a way that they'll all understand.
Perhaps you're talking about allowing more extensive access-controlled federated search of government repositories, but you'd still need NIEM to enable the exchange of bits and bytes between systems.
Joseph: I guess we are going to disagree on this. Search seems to me an integral part of the mandate for NIEM as I read it. Text based search would minimize the need for an information exchange superstructure to provide this.
Ray
Ray: I am very interested in getting this issue straightened out. Can you please clarify for me what exactly you're reading that you interpret as "searching" being an integral part of NIEM? Thanks!
Andrew: Thanks again for all the comments. On NIEM.gov's home page it says: \”… effectively share critical information in emergency situations, …\”I read such a statement as a requirement for search. I believe any sharing of critical information during emergency situations goes beyond defined system to system interfaces and requires a broad based search across databases, systems and jurisdictions. In my opinion, text based search, from Google, Microsoft, or others provides the easiest, quickest, and cheapest way to provide such capabilities.
Ray: I think you're reading this the wrong way. Sharing information in emergency situations is quite broad and is much much more than just searching. Consider a 911 center entering information in their computer aided dispatch system then sharing that information with other systems to coordinate response to an emergency. I consider this information sharing in emergency situations that really has nothing to do with searching.
When you say: "I believe any sharing of critical information during emergency situations goes beyond defined system to system interfaces and requires a broad based search across databases, systems and jurisdictions." You're really stating an oxymoron. A broad search across databases, systems, and jurisdictions will still require some sort of defined interface. Otherwise how does each disparate system know what its sending and receiving. As soon as you begin to connect disparate systems, you need a platform-independent way of representing semantics of data, otherwise you run the risk of data quality issues. Imagine poor data quality in an emergency situation.
Consider the concept of federated search. This is where I have the ability to search my local system, but behind the scenes my local system may reach out to other systems. In this case, NIEM would define what the request and reply messages look like.
I really feel like you're confusing human-to-system interaction with system-to-system interaction. NIEM is only about system-to-system. If I have a user interface that just sits on top of a local database, there is no role for NIEM. But as soon as I want that local database to populate information from other disparate systems, now there is a role for NIEM.
Andrew: I guess were going to have to disagree. A broad search does not need system defined interfaces if it is text based. As for data quality, this depends on the source of the data, e.g., the 911 center's data quality would be the same if it was in a text form or a database record.
Without an interface, how does that text enter or leave a system? As far as I'm concerned, it is impossible to do a system-to-system transaction without some type of interface. Interfaces can be very loosely or tightly defined, but you need interfaces to support system-to-system communication.
My concern about data quality is not about how its stored, but how its interpreted via the exchange. For example, the element "Case Type" means different things to different domains. Courts may view "Case Type" as a way to delineate between civil and criminal cases whereas the Transportation Security Administration may view "Case Type" as a way of describing a piece of luggage. My concern about data quality is being able to preserve semantics so information is not misinterpreted.
We clearly have two different perspectives here. You're operating under the assumption that information that you want to search already safely resides in a system. I'm coming from the perspective that you need to get that information into the system from another system while preserving the original intent of that information.