1. DCMs in MDA
Modern development techniques apply often the Model Driven Architecture (MDA). In this, models are important, and reside at the logical level mostly. DCMs have a place in MDA. This can best be explained using the Generic Component Model (GCM) [
10]. This cubical model positions DCMs in healthcare architectures, using a three-dimensional space. GCM characterizes any system by three axes: domain, system components, and system development (
Figure 2).
At the system axis (x-axis), the Reference Model of Open Distributed Processing (RM-ODP) serves as a coordinating framework. This framework comprises with five components namely the enterprise viewpoint, information viewpoint, computational viewpoint, engineering viewpoint, and finally technical viewpoint. The RM-ODP positions DCMs in the enterprise, information and the computation space (e.g., for detailed computational specifications, such as calculation of total scores on data and such).
The second axis (y-axis) specifies the system development approach of the MDA. MDA separates the healthcare business and the application logic of EHRs from the specific implementation technology [
11]. According to Blobel [
10], this MDA depends on standards, traceability, and explicit relationships between system components. And at the lowest level of clinical detail, that is what DCMs provide: consistency, traceability, and reusability, while covering the conceptual and logical levels. DCMs fit into larger logical models, such as reference (information) models.
At the domain axis (z-axis), the different healthcare domains, such as clinical specialties, are depicted. It is represented from the business at the top, to the fine grained data elements on the bottom. The latter specified in DCMs, reusable from domain to domain.
On the physical level, the DCMs based health data from EHRs must be storable and remain available for many years and in numerous technologies. Looking at applicable technologies for data preservation is relevant for clinical modeling exercises.
File format conversion engines that are constrained to one data type and in-house software base are available [
12]. For example, FileFormat.Info (
http://www.fileformat.info) includes file format conversion tools for images only based on Java Advanced Imaging libraries (javax.imageio.
* and javax.media.jai.
*). There exist a few file format conversion services that support only certain conversion types (e.g.,
http://www.ps2pdf.com 1 conversion type;
http://media-convert.com about 20 multi-media formats;
http://www.zamzar.com selected conversions of document, image, music, video, and couple of CAD formats). The main drawback of the existing conversion systems is that they are not extensible (limited by the availability of specific libraries).
In order to design an extensible file format conversion system based on utilizing third party software several problems have to be addressed [
12]. First, the problem of automated execution of the software, most GUI based, without having access to an application programming interface. AutoHotkey (
http://www.autohotkey.com) scripting is a viable option for the Windows operating system and the current Polyglot implementation is based on it. Second, the problem of distributed computational resources has been approached in the past by the Grid community, TeraGrid (
https://www.xsede.org/tg-archives) and Globus Toolkit (
http://toolkit.globus.org/toolkit/), for building computational grids, and the design of workflow middleware that would manage the execution, such as DAGMan, CCA (
http://www.cca-forum.org) or Taverna (
http://www.taverna.org.uk/), among others.
Due to the heterogeneity of computational hardware, this problem also requires considerations about options for parallel processing [
12], for instance, the use of 1) a message passing interface is designed for the coordination of a program running as multiple processes in a distributed memory environment by using passing control messages; 2) open multi-processing is intended for shared memory machines. It uses a multithreading approach where the master threads fork any number of slave threads; 3) the map reduce parallel programming paradigm for commodity clusters which allows programmers write simple Map and Reduce functions, which are then automatically parallelized without requiring the programmers to code the details and communications of parallel processes; and 4) novel architectures FPGAs, GPUs, multiple CPUs. Unfortunately, none of the existing grid solutions are an option when utilizing 3rd party binaries compiled for specific hardware on one machine.
Workflow solutions could potentially orchestrate calling computational resources based on a conversion sequences, however most do not robustly deal with solely GUI based software and also tasks specific needs must be considered, such as clustering the conversion execution sequence into segments that do not require data movement, and then managing and monitoring entire conversion executions [
12].
Such technologies would contribute to the preservation of data in EHRs and their use in various systems, using various technical formats and representations of the same clinical data based on DCMs.
2. Requirements for Clinical Data
Healthcare has several different purposes for use of clinical data, such as clinical care, continuity of care, quality indicators, decision support, management, billing information, clinical trials, and epidemiological studies among others [
9]. This is illustrated in
Figure 3. Each purpose requires analysis of the requirements, in particular data granularity, validity, relevance, preciseness, and reliability. Each data use implies a specific set of attributes and constraints for the data entry, storage, processing, presentation, communication, selection, and aggregation. However, important is also that each data reuse from an EHR poses validity and reliability questions. For example, are there biases, confounders, and other factors to be taken into account? And finally, each data use has its own expectations for data preservation. The assumption is that clinical data are recorded into an EHR system during the primary care process, and that DCMs are applied to guide the data entry. For each purpose the DCMs serve as the semantic baseline. For instance, for continuity of care, the DCMs will be transformed into a HL7 v3 XML format. Hence they serve to define the message or document payload. Here the DCMs prove their value. It might be that all data according to the DCM are stored in the EHR, but not all data need to be exchanged. At the message/document definition a selection of the data can be made. However, each data element exchanged will still have all of the standardized characteristics according the DCM.
For all other purposes, similar mechanisms will apply. All data in EHR are available, process of selection might lead to minimizing the amount of data required. And then the aggregation process will add some features. For instance for a quality indicator, the denominator can be a data element standardized in a DCM, while the aggregation process requires a nominator as numeric value derived from the number of occurrences of the denominator, e.g., 'pressure ulcer risk present' as denominator specified as resulting data from an risk assessment, and the occurrence of 80 in a patient population of 320 would reveal an incidence rate of 25%. The data element handled comes from the DCM. The calculations are then based on scientific methods for incidence rates, and some policy might determine that the percentage of patients at risk for pressure ulcer might be a good quality indicator. However, in actual care, this indicator might need a second standard, DCM based, data element to be present and handled similarly, e.g., "patient receives preventive measures for pressure ulcer." If 80 of the 320 patients would get preventive care, this would be 25%, and a perfect match to the risk, and hence perfect care. For most goals, a selection of data elements according DCMs will be used, but still keep its standard features.
4. Comparing Existing Formats
Comparing existing formats for the same medical concepts reveals the residue that is the core clinical content specified in DCMs. Goossen and Goossen-Baremans [
13] carried out an analysis of clinical concepts in the format of archetype, HL7 v3 model, and DCM in UML format. Although that approach is using a specific bottom up analysis, and looked at data types and code bindings, it also showed the overall models. In this paper, we present a similar analysis on logical model level of the Glasgow Coma Scale (GCS) [
14]. The GCS is used to determine the level of consciousness of patients after trauma, with stroke, or for other head injuries [
14]. The GCS consists of three categories of data, representing eye opening, best motor response and best verbal response that are summed up into a total score. The GCS is scored by documenting the number representing the best response for each category that could be observed with the patient.
Table 2 specifies the conceptual knowledge about the GCS.
The different expressions of the GCS in the different modeling approaches are illustrated below. For the ADL and XML, the basic parts showing the semantics are presented. Many technical ADL and XML specifications have been left out for easy reading. First the GCS core as an archetype version will be depicted (
Figure 4).
Next, the HL7 v3 RIM based artifact is shown (
Figure 5), and its XML equivalent (
Figure 6).
Finally, the UML representation of a DCM is shown (
Figure 7) and a preliminary representation using the recent HL7 Fast Healthcare Interoperability Resources (FHIR) approach (
Figure 8). Note that a DCM can be expressed in any logical modeling method. UML is just for the illustration of the commonalties and differences. An example DCM is represented in
Figure 7, using UML.
Finally, HL7 is currently investing in the new format for data exchange, FHIR [
15]. The power to preserve investments in DCMs is illustrated best that in a very short time the FHIR resource exporter could be added to the used modeling tool, and export of FHIR resource format is available.
Figure 8 illustrates a fragment of the FHIR resource, showing the four core data elements of the GCS. Note that to keep this readable, the XML parts that define the GCS and its values in FHIR have been left out of the example, and none core parts have also been removed. FHIR allows both the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) coding [
16] and the Logical Observation Identifiers Names and Codes (LOINC) coding [
17] to be present in the definition. Again, we only show here LOINC codes as example.
What these examples illustrate clearly is that the medical knowledge is the same; each model has the core components of the GCS expressed. And that must be so to ensure semantic interoperability. However, due to the technological choices, it is obvious that the technical specification part for each implementation specification differs. The art of modeling requires that we try every attempt to move from the technological approach to the clinician, and this can be done through the logical modeling of the conceptual content. In other words, let the technicians and modelers deal with the intricacies of modeling, do not bother doctors and nurses with it. But offer tooling that allow to do this consistently so that every implementation format for the computational level can use it adequately.
These results indicate that it is feasible to compare and reuse information models for single or combined clinical data elements and for assessment scales from one implementation approach to the other [
13,
18]. When the specific limitations of each approach are taken into account and a precise analysis of each data-item is carried out, it is possible to reveal the semantics of the different models, abstract from that and transform it into another logical model. In particular, the HL7 template approach and the ISO/CEN 13606 and OpenEHR archetypes reveal more commonalties than differences. Semantics are about the interaction between the medical knowledge represented in the clinical concepts, the information model representing it in technology, and the terminology model revealing its semantics [
13]. The presented models do have a generic and equivalent structure where these concepts fit. The structures include for HL7 v3 the Clinical Statement Pattern, and for the 13606 and OpenEHR archetypes the Entry level. Both structures allow 1 - n data elements to be represented and linked together. The DCM example in UML applies a full class diagram in which the concept is modeled; each data element is represented in a class. According to Goossen and Goossen-Baremans [
13], the best level of comparing HL7 v3 and OpenEHR is at the Clinical Statement Pattern versus Entry level, respectively, where both express a single clinical relevant data element. However, concepts do partly get their meaning from the structure they are embedded in. Hence this bottom up approach will lead to 100% basic semantics equivalence for data elements, but it will never lead to a 100% comparability of the much more abstract reference models.
However, it is possible to extract data from storage in one formalism and represent it into another formalism. And for the preservation of data that is very important.