Semantic
Data Services for:
Brooke Stevenson
MetaMatrix
Account Manager, Civilian Sales
bstevenson@metamatrix.com
What we have here is a failure to Interoperate
A History of Data Interoperability
It well documented the Federal Government continues to struggle and fail at implementing large-scale information-sharing programs. Even though there are presidential directives to share information, OMB has promulgated the Data Reference Model, DISA promotes Net Centric Enterprise Services, DOJ spent 4 years creating the Global Justice XSD and has now joined up with DHS to develop the National Information Exchange Model (NIEM), the intelligence community has multiple efforts underway (DNI ESI, IC-MAP DoDIIS Data Layer), and there are many other efforts trying to address information sharing– the goal remains elusive.
The complexity of these information sharing programs is largely due to the dependency on a wide variety of disparate information sources; data silos independently designed and managed cross-department, agency, state, and local governments for other ongoing missions. There are significant roadblocks to accessing and integrating the data in these systems:
So, can the interoperability
problem be solved? Well, the government is on the right track with its
decision to adopt Service Oriented Architecture (SOA). SOA is a set of policies, practices, principles and
frameworks that allow for the encapsulation of data and processes
as a set of software services with standard interfaces and protocols that can
be accessed by a growing and ever changing community of information
consumers. By leveraging the
industry-standard XML, SOAP, WSDL, and UDDI protocols, services can be published, discovered and used in a technology
neutral, standard way – reducing the
dependency on custom coding. XML,
the most widely used of these standards, is used to define domain vocabularies,
such as GJXDM, NIEM, EXDL, HL7, C2IEDM and the list goes on.
Given the increasing
interest and adoption of these vocabularies, the availability of commercial
off-the-shelf (COTS) tools for producing data services conforming not only to
these vocabularies, but to web service standards, means that the solution to
the interoperability conundrum is at hand.
Semantic Data Services
Semantic Data Services leverage domain vocabularies and technology standards to provide secure real-time access to existing data sources. Providing the context for sharing information based on program needs, Semantic Data Services also support the dispersed data ownership requirement that generally exists for these programs.
Semantic Data Services are:
1. Data objects with domain-specific semantics (vocabularies)
2. Provide access to legacy information systems
3. Accessible via SQL, XQuery, or SOAP requests
4. Through JDBC, ODBC, SOAP or JMS interfaces
5. Defined as: relational schemas, xml documents, or W3C compliant web services
6. Discoverable via UDDI registries as WSDL or,
7. Discoverable via standard ANSI 92 SQL
Semantic Data Services provide access to data sources that have been established independently by state and local governments, federal agencies, and commercial information providers. The disparate nature of these data sources present two key challenges to sharing:
1. A variety of software platforms.
Many commercial technologies have been developed and used to integrate information across different data management platforms, typically using connector or adaptor frameworks. Some of the more well-known integration technologies are MOM, EAI, ETL, and EII. The best of these commercial technologies require very little custom programming, if any. Semantic Data Services eliminate custom coding completely.
2. A range of vocabularies.
The vocabularies typically vary based on the mission of the data collecting entity and are unique to the vernacular of the agency, user community, and technical design team. For example the Baltimore City Police department may reference an individual as a “Suspect” within an Oracle database, while the FBI might use the term “Person of Interest” in an XML Document. The difference in terminology, “Suspect” versus “Person of Interest”, is a semantic problem. Understanding semantics is essential to interoperability.
With the lack of helpful tools, the magnitude of the semantic
problem has been universally underestimated and is often the downfall of
progress within a program. With the introduction of Semantic Data Services and other semantic mediation technologies, it
is now possible to discover and manage semantic relationships across
information systems in a timely manner and on a large scale. Adopting commercial semantic mediation
technologies is the last technology gap the government needs to fill to support
information sharing.
Understanding Semantic
Data Services

Figure 1. Data Source Services (defined below) provide seamless
access to disparate information systems. Business Data Services (defined below),
as depicted, can be mapped from Data Source Services, but can also be transformed from other Business Data
Services as depicted in Figure 2 where the XML Document Interface is accessed
by the TWPDES Business Data Service.
Semantic Data Services can be categorized in two ways, as:
1. Data Source Services; facades encapsulating existing information resources and their native semantics, which allow for the seamless access, federation, and deployment of disparate data sources.
Data Source Services will likely be in the form of XML documents or Web Services as independent data owners migrate to SOA standards. Many Data Source Services are defined in the form of Relational Schemas, leveraging out-of-the-box connectivity to standard JDBC and ODBC databases across the local, state, regional, and federal organizations. Data Source Services also leverage connectivity to other data applications, such as spreadsheets, text files, ERP, CRM, and other proprietary applications.
Data Source Services, captured typically through automated wizards,
contain knowledge of the underlying structure and semantics of data sources. Data Source Services provide an
information abstraction layer that can manage data access and interchange in a
uniform manner.
2. Business Data Services; data objects or views transformed from Data Source Services, which support real-time information retrieval according to domain or mission-specific semantics.
As the government continues to adopt Service Oriented Architecture (SOA) and evolve interoperability standards such as the Global Justice XML Data Model (GJXDM) and the National Information Exchange Model (NIEM), Business Data Services will most likely be defined in the form of Web Services that provide access to data in the form of XML documents. These documents will comply with pre-defined information exchange standards such as the Terrorist Watchlist Person Data Exchange Standard (TWPDES). TWPDES, produced by the ICMWG (Intelligence Community Metadata Working Group), is currently in the form of a GJXDM-compliant XML Schema.
Generation of Business Data Services is also semi-automated by wizards. Represented as Relational Schemas, XML Schemas or Web Services, these services can be reverse-engineered from XSD files, WSDL files, or 3rd party modeling tool, such as Erwin, Popkin, or Rational Rose.
Business Data Services are defined with mappings to one or more Data Source Services, as depicted in figure 1. This is where the details of data integration are defined, including such commonly needed data reconciliation functions as name, attribute, and data type conversions. These services represent combined, transformed views of the relevant data sources. Although Business Data Services are even more abstract in nature (there exists no data source with this schema instantiated as its model), they appear as concrete as a Data Source Service when they are deployed.
Business Data Services are a subset of a broader set of Business Services an interoperability program will be required to support. The primary other type of Business Service is a Business Process Service, which is generally created and deployed with MOM, EAI technology, or on an Enterprise Service Bus. Business Process Services will leverage Business Data Services for domain-specific access to disparate information sources.
We have now defined a repeatable model for supporting interoperability across any arbitrary number of information sources:
Semantic Data Services for
“Working Groups”
The burden in managing information access in a shared environment is typically split between the centralized program team and the data owners. Programs often form “Working Groups” to encourage frequent communication and joint decision making between these independent groups. Semantic Data Services provide a tool to facilitate a very granular level of communication regarding what information each party will provide or consume and how.
However, often the sticking point for providers is that ultimately they also have the burden of supporting legacy applications that leverage the same data that is desired for the sharing initiative. Business Data Services are an abstraction mechanism that can be used by data owners and/or the centralized program team to control who sees and has access to what information. Specifically, when data owners prefer not to provide direct access to databases or data applications, they can use the concept locally to define XML Documents and/or Web Services in an effort to better control the shared access.
The Watch-list Example
The Semantic Data Services approach can be used to tackle the Terrorist Tracking problem. Given TWPDES (Terrorist Watchlist Person Data Exchange Standard), an existing XML Schema, a Business Data Service can be generated to expose all “person of interest” information. The TWPDES Business Data Service represents a combined, transformed view of the various government data sources (local, state, regional, and federal) across which we can build a complete interoperable picture for tracking terrorists.

Figure 2.The process for generating a TWPDES-compliant Web Service.
The process for generating TWPDES-compliant Web Services involves several steps, most of which are automated within Semantic Data Services COTS.
1. Import of the existing TWPDES.xsd
2. Reverse-engineering of the Data Source Services
3. Generation of a TWPDES XML Document (Business Data Service), which is mapped to Data Source Services
4. Automatic (if a pre-defined WSDL exists) or manual generation of TWPDES Web Service (Business Data Service), which is mapped to the TWPDES XML Document
5. Automatic generation of WSDL (if it was not pre-defined) and deployment to a UDDI registry
Leveraging the power of a metadata-driven architecture, rather than costly and brittle programmatic solutions, model-driven COTS provides an extensible and scalable infrastructure for defining, relating, and accessing disparate government data sources.
Semantic Interoperability: Automatically Resolving
Vocabularies
The Federal Government will benefit by adopting an automated approach to semantic mapping of disparate schemas and vocabularies. TWPDES, a discrete information exchange packet, has over 1,000 core entities. The NIEM Universal schema, one of many within NIEM, has over 100,000 core entities. No team of human beings will ever be able to map tens of thousands of entities across hundreds of data sources. Even programs that only deal with a dozen data sources yield hundreds of thousands of potential mappings. This mapping effort alone has stopped many interoperability programs dead in their tracks.
Automated semantic matching, a necessary component of any Semantic Data Services product, provides automated semantic mapping technology to aid domain experts in more quickly reconciling the semantics across a dispersed information environment. Automated semantic matching technology, an extensible ontology-driven tool, implements a variety of sophisticated algorithms for determining semantic equivalence. It leverages the previously defined Data Source and Business Data Services to aid in more rapid deployment of a mediation solution by automatically exposing potential semantic matches.
Automated semantic matching discovers similarities between elements of heterogeneous structured data sources, Data Source Services with departmental specific vocabularies. Automated semantic matching also supports matching elements of data sources, represented as Data Source Services, to target schemas of Business Data Services, such as TWPDES or any other GJXDM, NIEM, C2IEDM, HL7 or other schema. The key steps in running an automated semantic matching process are:
Examples of potential matches that can be automatically discovered with automated semantic matching are:

Figure 3. Semantic similarities are revealed across data sources using the semantic relationships stored within a pre-populated upper-ontology. The pre-populated upper-ontology is essentially a combined Dictionary and Thesaurus of the English Language. This upper-ontology can be extended or replaced by domain specific ontologies.
Implications for Efficient and Effective
Information Sharing
By capturing the necessary transformations and mappings to
define Business Data Services (from Data Source Services) that can then in
turn be used to directly drive integration, COTS can provide federation and
semantic capabilities that are highly efficient (by leveraging existing agency
data sources) and extremely flexible. Automated semantic matching functionality,
with an extensible upper-ontology (see figure 3) and match engine, greatly
increases the efficiency with which the mappings can be discovered.
Semantic Data Services products consider and provide the following benefits: