DM2 Data Groups
Information and Data
Information is the state of a something-of-interest that is materialized, in any medium or form, and communicated or received. In DoDAF V1.0, this took the form of what was called a logical data model which even in DoDAF V1.0 permitted a less structured and formalized description than the computer science definition of a logical data model. In DoDAF V2.0, the emphasis is on the identification and description of the information in a semantic form (what it means) and why it is of interest (who uses it). Although this may entail some formality such as describing relationships between concepts, its purpose is to convey the interests in the operator, executive, or business person's frame of reference.
Data is the representation of information in a formalized manner suitable for communication, interpretation, or processing by humans or by automatic means, and is concerned with the encoding of information for repeatability, meaning, and proceduralized use. While information descriptions are useful in understanding requirements, e.g., inter-federate information sharing requirements or intra-federate representation strategies, data descriptions are important in responsive implementations of those requirements and assurances of interoperable data sharing within and between federates.
Data Group Description
The DoDAF Meta Model, for the data comprising Information and Data, is shown in the figure below.
Information and Data Model Diagram
(Click image to enlarge)
Items of note are as follows:
- The key concept in this model is that Information describes some Thing - material, temporal, or even abstract, such as a relationship (Tuple) or set (Type).
- Since Information is a Thing, Information can describe other Information, e.g., metadata.
- A Name is a type of Information in that it describes a Thing. A Name may be short or long - there is no restriction. So a textual description can be thought of a just a long Name. Information is more general than text strings and could be structured, formalized, or include other manners of description such as diagrams or images.
- Information, as a Resource Type, inherits whole-part, super-subtype, and before-after relationships.
- If Information is processable by humans or machines in a repeatable way, it is called proceduralized. Not all proceduralized information is necessarily computerized; forms are examples of data proceduralized for human repeatable processing.
- Data to be proceduralized has associations such as parts and types as well as other application specific associations. So for an Entity-Relationship model, Attributes are has associations with Entities and Entities are related according to verb phrases and cardinalities. In the physical schema, the fields are associated to data types.
- The representation for Data is not intended to cover all the details of, for instance, a relational data base management system (DBMS) underlying Meta-model, but just those aspects necessary to support the decision-making of the core processes.
- Architectural Descriptions describes architectures. An Activity Model is an example of an Architectural Description. Two subtypes of Architectural Description are called out - the AV-1 and the Manifest - because of their importance in discovery and exchange, respectively. Note that the AV-1 information can also be provided in a structured manner, using the Project data group to describe the architecture project's goals, timeline, activities, resources, productions, rules, measures, etc. In a typical development project, the architecture descriptions will be at increasing levels of detail, what John Zachman calls "levels of reification".
It should be noted that all methods, even the most philosophical and methodical, involve the ingestion of some record of the enterprise's processes, legacy information-keeping systems, and descriptions of what types of things it thinks it deals with. Upon collection of this raw data, terms within it are then:
- Identified. This is done by noting recurring or key terms.
- Understood. Definitions of terms are sought and researched. In most cases, there are multiple authoritative definitions. Definitions selected should be appropriate for the context of use of the term within the enterprise activities.
- Collated and correlated. This is done by grouping seemingly similar or related terms.
- Harmonized. In this step, aliases, near-aliases, and composite terms are identified. A consensus definition is formulated from the authoritative source definitions. Often super-subtype and whole-part relationships begin to emerge.
The next step is to relate the harmonized terms. Some of the relationships are implicit in the definitions and these definitions may contribute to the relationship description. At this point, the formality can vary. A formal ontological approach will type all relationships to foundational concepts such as whole-part and super-subtype. However, there are many metaphysical challenges with such an approach and it is not necessary for many applications. This constitutes the conceptual-level of modeling, defined and related terms, now considered concepts because the definitions and relationships lend a meaning to the terms. The conceptual model should be understandable by anyone knowledgeable about the enterprise. Super-subtype and whole-part relationships can provide cognitive economy. Conceptual models can be done in Entity-Relationship or UML Class model style although any format that documents definitions and relationships is functionally equivalent. Note that the subtype concept in UML generally results in the subclass inheriting properties from the supertype while in Entity-Relationship (E-R) modeling only the identifying keys are inherited directly; the other supertype properties are available after a join operation.
At the logical-level, relationships may have cardinalities or other rules added that indicate how many of one instance of something relates to an instance of something else, the necessity of such relations, and so on. The concepts may also be attributed, meaning they will be said to have some other concept, e.g., the concept of eye has the concept of color. Often at the logical-level, the relationships are reified or made concrete or explicit. At the logical-level, this is done in case there is something additional that needs to be stated about the relationship, e.g., the quantity of some part of something or the classification of the related information, which may be different from the classification of the individual elements. There may also be considerations of normalization, meaning that the database structure is modified for general-purpose querying and is free of certain undesirable characteristics during insertion, update, and deletion operations that could lead to a loss of data integrity. The benefits of normalization are to uncover additional business rules that might have been overlooked without the analytical rigor of normalization and ensure the precise capture of business logic. The logical model, though having more parts than the conceptual model, should still be understandable by enterprise experts. At the logical-level, some sort of modeling style is normally used such as Entity-Relationship or UML Class modeling.
At the physical-level, the exact means by which the information is to be exchanged, stored, and processed is determined. At this level, we are talking about data. The efficiency, reliability, and assured repeatability of the data use are considered. The datatypes, the exact format in which the data is stored are determined. The datatype needs to accommodate all the data that is permissible to store or exchange yet be efficient and disallow formats that are not permissible. The entities may be de-normalized for efficiency so that join operations don't have to be performed. Logical associations may be replaced with identifiers (e.g., as associative entities or foreign or migrated keys in Entity Relationship Diagrams [ERDs] or explicit identifier attributes or association classes in class models). Keys, identifiers, and other means of lookup are setup. Indexes, hashes, and other mechanisms may be setup to allow data access in accordance with requirements. The physical target may be any of the following:
- Database – relational, object, or flat file.
- Message exchange format – document (e.g., XML), binary (e.g., Interface Definition Language (IDL)).
- Cybernetic (human – machine), e.g., print or screen formats, such as forms.
Usage in Core Processes
Information and Data models are used in the following ways:
- Commonality and Interoperability between Core processes
1. Information models materialize for enterprise participants what things are important to the enterprise and how they are related.
2. Information models can serve as a basis for standardization of terminology and concept inter-relationships for human, machine, and human-machine communications.
3. Information models can provide cognitive compactness for an enterprise's personnel through the use of taxonomies and other relationship structures. This can improve clarity, efficiency, accuracy, and interoperability of action within the enterprise.
4. Information models document the scope of things the enterprise is concerned with in a form that allows comparison with other communities of interest to reveal common interests.
5. COI coordination and harmonization.
6. Authoritative sources identification and management.
- JCIDS and PPBE
1) Data and information models can be used to determine if a proposed capability will interoperate, be redundant with, or fill gaps in conjunction with other capabilities.
- SE and DAS
1) Data models can be used to generate persistent storage of information such as in databases.
2) Data models can be used to generate formats for exchanging data between machines, humans, and machine-to-human. For example, an XSD is a physical data model that is generally an exchange format. Web services can be used with relational DBMS' to generate XML for exchange in the format of the data model implemented in the DBMS. The underlying data models (the physical data model and the exchange data format) do not have to be the same; a translator or mediator may be invoked to translate during the exchange.
3) Data models can be used to compare whether Performers are compatible for data exchange.
4) Interdependent data or information needs.
5) Data and information models can be used during milestone reviews to verify interoperability, non-redundancy, and sufficiency of the solution.
6) Information models are useful in initial discovery of a service, to know what sorts of information it may provide access to or its accessed capabilities need. An information model is part of a service description.
7) Data models are useful in knowing how to interact with a service and the capabilities it provides and for establishing the service contract. A data model is part of a service description and service contract.
8) Database/sources consolidation and migration.
9) Standards definition and establishment.
10) Mediation and cross-COI sharing.
- OPS Planning
1) Data and information models can be used to determine if components of a portfolio have:
2) Overlapping data or information production (an indication of potential unwanted redundancy).
3) Data assets management.
Presentation of Information and Data are depicted using all the forms shown in 1.3 and manifest themselves in the presentation of many of the other Data Groups. Modeling information and data have well established techniques and styles. Techniques for constructing and presenting models of Information and Data vary. They are taught in academic and vocational curricula. There is considerable literature, such as books, professional journals, conference proceedings, and professional magazines, on best practices, experiences, and theory. The figure below illustrates some of the basic methods for model creation.
Examples of the Ways Information and Data Models are Constructed