Healthc Inform Res Search


Healthc Inform Res > Volume 23(2); 2017 > Article
Park, Lee, On, Lee, Jung, and Park: 2016 Year-in-Review of Clinical and Consumer Informatics: Analysis and Visualization of Keywords and Topics



The objective of this study was to review and visualize the medical informatics field over the previous 12 months according to the frequencies of keywords and topics in papers published in the top four journals in the field and in Healthcare Informatics Research (HIR), an official journal of the Korean Society of Medical Informatics.


A six-person team conducted an extensive review of the literature on clinical and consumer informatics. The literature was searched using keywords employed in the American Medical Informatics Association year-in-review process and organized into 14 topics used in that process. Data were analyzed using word clouds, social network analysis, and association rules.


The literature search yielded 370 references and 1,123 unique keywords. ‘Electronic Health Record’ (EHR) (78.6%) was the most frequently appearing keyword in the articles published in the five studied journals, followed by ‘telemedicine’ (2.1%). EHR (37.6%) was also the most frequently studied topic area, followed by clinical informatics (12.0%). However, ‘telemedicine’ (17.0%) was the most frequently appearing keyword in articles published in HIR, followed by ‘telecommunications’ (4.5%). Telemedicine (47.1%) was the most frequently studied topic area, followed by EHR (14.7%).


The study findings reflect the Korean government's efforts to introduce telemedicine into the Korean healthcare system and reactions to this from the stakeholders associated with telemedicine.

I. Introduction

Medical informatics incorporates a core set of methodologies and technologies that are applied to the management of data, information, and knowledge at multiple levels, including molecular, tissue, patient, and population levels [1]. New methodologies and technologies are being implemented in these vast and expanding fields [2], with articles on a wide range of related topics having been published. Information on which topics are the most interesting for authors is important to understanding the past and preparing for the future; therefore, it is interesting to survey the published literature.
Yergens et al. [2] reviewed the yearbooks published by the International Medical Informatics Association on medical informatics globally between 1992 and 2015, dividing references into three time periods. They found that the publications were more technical and method-oriented between 1992 and 1999, more clinical and patient-oriented between 2000 and 2009, and more focused on the emergence of big data, decision support, and global health between 2010 and 2015. They presented the review results visually as word clouds, cluster maps, and dashboards.
More than 10 years ago, the American Medical Informatics Association (AMIA) introduced a year-in-review process to survey the previous year's publications in the United States. The first year-in-review on medical informatics was delivered at the AMIA Annual Symposium in 2006 by Dr. Masys [3], who presented an annual review of the previous year's research publications and major events. Over time, a tradition of the AMIA annual conference has become to break off parts of the annual review and focus on specific topics, such as biomedical informatics, translational bioinformatics, and consumer and clinical informatics.
The Scientific Program Committee of the 2016 Conference of the Asia-Pacific Association for Medical Informatics (APAMI) decided to introduce the first year-in-review on medical informatics. This was initiated by the formation of a team that reviewed articles published in the top four journals in medical informatics based on their impact factors: International Journal of Medical Informatics, Journal of American Medical Informatics Association, Journal of Biomedical Informatics, and Journal of Medical Internet Research. As an attempt to add Asian-Pacific perspectives, articles published in English in Healthcare Informatics Research (HIR), which is an official journal of the Korean Society of Medical Informatics (KOSMI), were also included in the review.
The review group empaneled to conduct this study comprised one senior scholar and five graduate students. The group analyzed the topics and keywords of the articles published during the previous 12 months and visualized the frequencies of topics and keywords and the relationships between the keywords. This task was accomplished using reference management software for summarizing and aggregating topics and keywords from the literature, and exploring several visualization techniques, including word clouds and topic clustering. The team also selected and reviewed three notable articles in the top five topic areas.
The results of the review were presented in a plenary session of the APAMI 2016 Conference. This article is based on that presentation.

II. Methods

A multistage review process was applied to the published literature. The scope for the search was English-language articles on topics related to clinical or consumer informatics appearing between October 1, 2015 and September 30, 2016 in five refereed journals indexed in PubMed.

1. Literature Search

We searched PubMed using the search terms used for the AMIA 2015 year-in-review on clinical and consumer informatics [4] to retrieve articles published between October 1, 2015 and September 30, 2016. We limited the journals to the following top four journals on medical informatics based on their impact factors and the number of citations: Journal of Medical Internet Research, Journal of Biomedical Informatics, Journal of the American Medical Informatics Association, and International Journal of Medical Informatics. We also added articles published in the HIR (an official journal of KOSMI) for further evaluation (Table 1).

2. Data Extraction

We reviewed the titles, abstracts, and main texts of the included articles to identify their keywords and topics. Keywords were collected from the keyword section of the literature, while topics were assigned by the authors using the 14 topics used by Hersh and Ash [5] for the year-in-review at the AMIA 2014 Annual Symposium (Table 2). All of the papers considered were organized according to the topics assigned by one of the authors and re-reviewed by another author.

3. Analysis and Visualization

After collecting keywords and identifying topics, we used several approaches to visually display the data. First, we used the word clouds feature in Tableau 10.1 and R 3.3.1 to visualize the frequencies of keywords and topics. Word clouds are generated by counting the frequency at which each word appears.
Second, we used social network analysis to perform a visual and mathematical analysis of keyword relationships. The nodes in the network were the keywords, while the links showed relationships or flows between the nodes. Network activity for a node was measured by using the concept of degrees corresponding to the number of direct connections to that node. We used NetMiner 4.2.2 (Cyram Inc., Seoul, Korea) for social network analysis to explore the relationships among the keywords.
Third, we used association rules to discover the strength of the relations between keywords. Association rules are created by analyzing keywords for frequent If-Then patterns and using the criteria of support, confidence, and lift to measure the strength of the rules [6]. The support of an itemset X, supp(X), is defined as the proportion of transactions in the data set that contain the itemset. The confidence of a rule, conf(X⇒Y), is defined as supp(X∪Y)/supp(X). The lift, lift(X⇒Y), is defined as supp(X∪Y)/(supp(X)supp(Y)). The association rules were analyzed using R 3.3.1.

4. Review of Selected Articles in the Top Five Topic Areas

The review process was not conducted strictly as a systematic review; rather, it was conducted to gather a broad literature base in the top five topic areas. All of the reviewers participated in a short calibration process involving papers to improve the consistency on the following 4-point scale: ‘must include,’ ‘may include,’ ‘possibly include,’ and ‘do not include.’ To be considered a ‘must include’ article, it had to provide a significant advance or novel application according to the reviewer's (admittedly subjective) opinion. Based on the reviewers' score, three articles were selected for each topic. All of the papers included for consideration were re-reviewed by the first author.

III. Results

1. Literature Search

The PubMed search engine returned an initial total of 381 articles. This list was further filtered to eliminate articles that were—based on their citation information and abstract—clearly outside the scope of interest. This yielded 370 articles for review. Figure 1 shows a flowchart of the number of articles included or excluded according to journal.

2. Data Extraction

The literature review of 370 articles returned an initial total of 1,959 keywords, and 574 topics were assigned as presented in Table 3. There were 1,123 unique keywords after the removal of 836 duplicates.

3. Analysis and Visualization

Word clouds of keywords extracted from the five journals and topics assigned by the authors are presented in Figure 2. ‘EHR’ (Electronic Health Record) was the most frequently appearing term in both word clouds: 78.6% for keywords and 37.6% for topics. This was followed by ‘telemedicine’ (2.1%) and ‘medical informatics’ (1.2%) for keywords and clinical informatics (12.0%) and HIT implementation (7.7%) for topics.
Word clouds of keywords and topics extracted from HIR are presented in Figure 3. ‘Telemedicine’ was the most frequently appearing term in both word clouds: 17.0% for keywords and 47.1% for topics. This was followed by ‘telecommunications’ (4.5%) and ‘remote consultation’ (3.4%) for keywords and EHR (14.7%), clinical informatics (11.8%), and mHealth (11.8%) for topics.
The results of the social network analysis of the keywords extracted from articles published in the five journals are presented in Table 4. The keyword with the highest degree centrality value (DCV) was ‘EHR,’ followed by ‘telemedicine,’ ‘medical informatics,’ and ‘natural language processing’ (NLP). Analysis of the structural cohesion of keywords revealed that there were 16 cohesive blocks. The in-degree centrality of keywords indicated that ‘EHR’ was the focal point, which was related to ‘telemedicine,’ ‘medical informatics,’ ‘NLP,’ ‘health information technology’ (HIT), ‘clinical decision support systems,’ and ‘health information exchange’ (HIE) (see Figure 4).
The results of social network analysis of keywords extracted from articles published in HIR are presented in Table 5. The keyword with the highest DCV was ‘telemedicine,’ followed by ‘clinical decision support systems,’ ‘EHR,’ and ‘medical informatics application.’ The structural cohesion analysis of the keywords showed that there were four cohesive blocks, with the in-degree centrality of keywords showing that ‘telemedicine’ was the focal point (Figure 5).
Nineteen association rules that surpassed the minimum support level of 0.01 and the minimum confidence level of 0.5 were extracted from the keywords. The most meaningful association rule was ‘HIT⇒EHRs,’ with a support level of 0.038 and a confidence level of 0.778 (Table 6).
Figure 6 shows a parallel coordinates plot for the 19 association rules extracted for keywords, in which the widths of the arrows represent the support. ‘HIT’ and ‘EHR’ had the strongest support values, followed by ‘NLP’ and ‘EHR.’
In total, 954 association rules were extracted from the keywords of articles published in HIR with a minimum support level of 0.01 and a minimum confidence level of 0.5. Table 7 lists 14 rules with a minimum support of 0.1 and a minimum confidence of 0.8. The most meaningful association rule was ‘telecommunications⇒telemedicine,’ with a support level of 0.2 and a confidence level of 1.0 (this means that an article with a keyword of ‘telecommunications’ also has a keyword of ‘telemedicine’).
Figure 7 shows a parallel coordinates plot for the 14 association rules for keywords, in which the widths of the arrows represent the support. The figure indicates that keywords used in HIR articles are associated either directly or indirectly with ‘telemedicine.’

4. Review of Articles in the Top Five Topic Areas

Two or three articles were reviewed in each of the following five topic areas: EHR, clinical informatics, HIT implementation, big data and analytics, and telemedicine.
Incentive programs of ‘meaningful use’ and the widespread adoption of EHRs by hospitals and physicians has led to a focus on the adoption of HIT by doctors and hospitals for consumer health, and also on eHealth. Moreover, there is a precision-medicine initiative seeking to engage a cohort of one million individuals who want to donate their data to improve the understanding of relationships between genetic, environmental, and other factors in their health and healthcare [7]. Patients and individuals using HIT could prove to be the single biggest force for innovation in health and healthcare delivery. For the informatics community to accelerate this inevitable transformation, we must trust individuals with their own information, and empower them to participate actively in their own care. The informatics community can accelerate this progress by partnering with patients to assure that research, applications, and advocacy align with the needs of the individuals. In EHR development there is a trend toward an openEHR approach. According to a longitudinal case study, key sociotechnical challenges to using the openEHR approach are the lack of technical and clinical competence in designing technical systems and modeling domains [8]. Developers and clinicians, therefore, need to work together in both arenas. The model-driven development of the openEHR approach has implications for medical practice per se in ensuring that medical concepts are standardized across practices. Regarding the adoption of EHRs, it was found that Health Information Technology for Economic and Clinical Health (HITECH) financial incentives accelerated the adoption of EHRs in small, physicianowned practices in the United States. However, the failure of the market to converge on a dominant design in the absence of interoperability will make it difficult to achieve the widespread exchange of patients’ clinical information among various healthcare provider organizations [9].
The increasing importance of clinical informatics led to the AMIA Task Force Report on CCIO (Chief Clinical Informatics Officer) Knowledge, Education, and Skillset Requirements defining the role of CCIO [10]. The CCIO role encompasses the more commonly used Chief Medical Informatics Officer and Chief Nursing Informatics Officer, as well as Chief Pharmacy Informatics Officer and Chief Dental Informatics Officer. The knowledge required of a CCIO was identified in four domains: fundamentals, clinical decisionmaking and care process improvement, health information systems, and leading and managing change. Informatics education and training must provide trainees with core competencies in patient care, medical knowledge, practicebased learning and improvement, skills, professionalism, and systems-based practice.
Since the launch of the clinical informatics subspecialty for physicians in 2013 in the United States, more than 1,100 physicians have used the practice and education pathways to become board-certified physicians in clinical informatics. The collective experience of the four clinical informatics fellowship programs run by Stanford University, Oregon Health & Science University, University of Illinois at Chicago, and Regenstrief Institute and accredited by the Accreditation Council on Graduate Medical Education were studied [11]. Several conclusions can be drawn from the experiences of these fellowships. First, all of these programs found significant interest in fellowship training for clinical informatics among the current generation of medical students and physicians-in-training. Second, there is no single ‘correct’ way to create a clinical informatics fellowship program. Third, although all four programs have achieved initial funding support, it is unclear whether their funding methods will be sustainable. Finally, it is critically important for all of the accredited clinical informatics fellowship programs (which now number 11 and are increasing rapidly) to share their experiences and lessons learned in order to continue to improve the training provided to all clinical informatics fellows.
Regarding HIT, provider-centric electronic records are widely available at the point of care in almost all countries according to international HIT benchmarking [12]. Twenty-nine of 38 countries had adoption rates exceeding 75%, but there are large differences in the specific data available electronically, in the functions enabled by digital solutions, and how frequently they are used by primary-care providers. There are also large variations between countries in the proportion of acute-care facilities that engage in HIE, specifically the percentage that electronically exchange radiology results and/or images with outside organizations. There was wide cross-national variation in telehealth capacity, specifically the availability of synchronous telehealth (typically video conferencing) in acute-care facilities. Regarding PHRs or patient access to online services, some countries have achieved the broad adoption of these solutions in primary care for e-appointment booking and e-requests for prescription renewals and refills. However, in many countries, only a minority of primary-care practices have made these functions available to patients.
Procedural and conceptual models are being used for designing HIT. When health-care work is modeled, graphical workflow models can become too complex to be useful to designers. Conceptual models complement and simplify workflows by providing an explicit specification for the required information product. Thus, the integration of conceptual models with workflow models has been proposed [13]. This method uses concurrent engineering principles to iterate between a track for a user-centered design and a track for a conventional, technology-centered design. The objective is to converge to a matched pair of designs: a measurably better workflow of care and a cost-effective HIT application that preferentially supports that workflow.
Regarding big data and analytics, the NIH in the United States have implemented the Big Data to Knowledge (BD2K) initiative to maximize the use of biomedical big data [14], focusing on the following four areas: (1) improving the ability to locate, access, share, and use biomedical big data; (2) developing and disseminating data analysis methods and software; (3) enhancing training in biomedical big data and data science; and (4) establishing centers of excellence in data science. This initiative introduced a big-data ecosystem called the Commons, which is a shared virtual space that conforms to the ‘FAIR’ principles: the ability to find, access, interoperate, and reuse the products of the research. Many centers are supporting the BD2K initiative; for example, the BD2K Center for Causal Discovery is developing and disseminating an integrated set of open-source tools that support the causal modeling and discovery of biomedical knowledge from large and complex biomedical data sets [15]. Another example is the BD2K center for big data in translational genomics [16].
Research, development, and applications related to telemedicine are increasing worldwide, with most of the publications coming from the United States. According to a literature review on usability in telemedicine systems [17], older adults and those with cardiovascular conditions were among the largest target end-user groups. Remote monitoring systems were addressed in most (90%) of the publications, followed by training and medical education, and consultation. Questionnaires are the most common means of evaluating telemedicine systems, being utilized in 88% of the studies, followed by observations, interviews, self-descriptions, and logging. In total, 71% of the publications were trial-oriented, with the remainder being process-oriented.

IV. Discussion

We analyzed the keywords and topics of the articles published during the previous 12 months in the top four journals in medical informatics and HIR (an official journal of KOSMI). We visualized the frequencies and relationships of keywords and topics using word clouds, social network analysis, and association rules.
‘EHR’ was the most widely used keyword in the articles published in medical informatics, and it had highest DCV. This was expected given that the rate at which HIT including EHRs is being adopted is increasing worldwide. In particular, the introduction of meaningful use regulation in the United States led to an increase in the number of studies on the use of clinical decision support systems and on the quality, usability, and interoperability of EHRs [4]. The present findings for association rules, such as strong associations between EHRs and usability semantic interoperability, HIT, NLP, and machine learning indicate that the adoption, implementation, and usage of EHRs is progressing. EHR was also the most widely researched topic area. EHRs were strongly associated with big data and analytics, and clinical informatics, which implies that EHRs are being used widely for patient care and research. EHRs are also associated with PHRs, which implies patient access and engagement beyond the clinical use of EHRs.
However, ‘telemedicine’ was the most widely used keyword in the articles published in HIR. This could be due to the Korean government's attempt to introduce telemedicine into the Korean healthcare system. However, there are heated debates among the government, insurers, medical service providers, and consumers [18], which led to a special issue of HIR in October 2015 to address issues related to telemedicine. This explains why so many articles on telemedicine have been published. Thus, the analysis results need to be interpreted carefully by taking this into consideration.
The present review did not include methodology as a separate criterion for categorizing the articles due to the difficulty of extracting information on methodology from article titles and abstracts. Moreover, the keywords analyzed in this study covered many aspects, such as the themes, subjects, analytics, and intervention delivery methods of each study. Thus, the analysis of association rules produces very complex results, including some related to study methodologies. We therefore recommend that future year-in-review studies should include separate criteria for methodology, since this would help to provide readers with a better understanding of medical informatics.
This review included articles published in HIR, an official journal of KOSMI, since they are published in English. We also would like to form an international review panel for a future APAMI year-in-review on medical informatics so that articles published in languages other than English can be included.
Despite journals on medical informatics recommending authors to use Medical Subject Heading (MeSH) terms as keywords, we found that keywords were presented in many different ways. We would like to recommend the authors use MeSH terms as keywords to facilitate semantic interoperability.


This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (NRF-2015R1A2A2A01008207 and 2010-0028631).


Conflict of Interest: No potential conflict of interest relevant to this article was reported.


1. Herland M, Khoshgoftaar TM, Wald R. A review of data mining using big data in health informatics. J Big Data 2014;1:2.
2. Yergens DW, Tam-Tham H, Minty EP. Visualization of the IMIA yearbook of medical informatics publications over the last 25 years. Yearb Med Inform 2016;(Suppl 1):S130-S138. PMID: 27362591.
pmid pmc
3. MasysD. AMIA informatics 2006 year in review [Internet]. Washington (DC): AMIA Annual Symposium; 2006. cited at 2016 Nov 29. Available from:

4. Roberts K, Boland MR, Pruinelli L, Dcruz J, Berry A, Georgsson M, et al. Biomedical informatics advancing the national health agenda: the AMIA 2015 year-in-review in clinical and consumer informatics. J Am Med Inform Assoc 2017;24(e1):e185-e190. PMID: 27497798.
5. Hersh W, Ash J. AMIA Annual Symposium 2014 year in review [Internet]. Washington (DC): AMIA Annual Symposium; 2014. cited 2016 Nov 29. Available from:

6. Hahsler M, Chelluboina S. Visualizing association rules: introduction to the R-extension package arulesViz. R Proj Modul 2011;2011:223-238.

7. Fridsma DB. Moving beyond the physician's EHR. J Am Med Inform Assoc 2015;22(6):1277PMID: 26555019.
8. Christensen B, Ellingsen G. Evaluating Model-Driven Development for large-scale EHRs through the openEHR approach. Int J Med Inform 2016;89:43-54. PMID: 26980358.
9. Cohen MF. Impact of the HITECH financial incentives on EHR adoption in small, physician-owned practices. Int J Med Inform 2016;94:143-154. PMID: 27573322.
10. Kannry J, Sengstack P, Thyvalikakath TP, Poikonen J, Middleton B, Payne T, et al. The Chief Clinical Informatics Officer (CCIO): AMIA Task Force report on CCIO knowledge, education, and skillset requirements. Appl Clin Inform 2016;7(1):143-176. PMID: 27081413.
11. Longhurst CA, Pageler NM, Palma JP, Finnell JT, Levy BP, Yackel TR, et al. Early experiences of accredited clinical informatics fellowships. J Am Med Inform Assoc 2016;23(4):829-834. PMID: 27206458.
crossref pmid
12. Zelmer J, Ronchi E, Hypponen H, Lupianez-Villanueva F, Codagnone C, Nohr C, et al. International health IT benchmarking: learning from cross-country comparisons. J Am Med Inform Assoc 2017;24(2):371-379. PMID: 27554825.
13. Berry AB, Butler KA, Harrington C, Braxton MO, Walker AJ, Pete N, et al. Using conceptual work products of health care to design health IT. J Biomed Inform 2016;59:15-30. PMID: 26528606.
14. Bourne PE, Bonazzi V, Dunn M, Green ED, Guyer M, Komatsoulis G, et al. The NIH Big Data to Knowledge (BD2K) initiative. J Am Med Inform Assoc 2015;22(6):1114PMID: 26555016.
15. Cooper GF, Bahar I, Becich MJ, Benos PV, Berg J, Espino JU, et al. The center for causal discovery of biomedical knowledge from big data. J Am Med Inform Assoc 2015;22(6):1132-1136. PMID: 26138794.
16. Paten B, Diekhans M, Druker BJ, Friend S, Guinney J, Gassner N, et al. The NIH BD2K center for big data in translational genomics. J Am Med Inform Assoc 2015;22(6):1143-1147. PMID: 26174866.
17. Klaassen B, van Beijnum BJ, Hermens HJ. Usability in telemedicine systems: a literature survey. Int J Med Inform 2016;93:57-69. PMID: 27435948.
18. Kwon IH. High time to discuss future-oriented telemedicine. Healthc Inform Res 2015;21(4):211-212. PMID: 26618025.
Figure 1

Flowchart of literature selection.

Figure 2

Word clouds of keywords and topics of articles published in the five included journals. EHR: Electronic Health Record, EMR: Electronic Medical Record, NLP: natural language processing, PHR: personal health record, HIE: health information exchange.

Figure 3

Word clouds with keywords and topics of articles published in Healthcare Informatics Research. EHR: Electronic Health Record, HIE: health information exchange.

Figure 4

Visualization obtained by social network analysis of keywords of the articles published in the five journals. EHR: Electronic Health Record, NLP: natural language processing.

Figure 5

Visualization obtained by social network analysis of keywords of the articles published in Healthcare Informatics Research. EHR: Electronic Health Record.

Figure 6

Parallel coordinate plot for 19 association rules for keywords of the articles published in the five journals. PHR: personal health record, EHR: Electronic Health Record, NLP: natural language processing.

Figure 7

Parallel coordinate plot for 14 association rules for keywords of the articles published in Healthcare Informatics Research.

Table 1

PubMed search criteria

Table 2

Topics covered in year-in-review

Table 3

Number of keywords and topics of articles reviewed

Table 4

DCV of keywords extracted from articles published in the five journals


DCV: degree centrality value, EHR: Electronic Health Record.

Table 5

DCV of keywords extracted from articles published in Healthcare Informatics Research


DCV: degree centrality value, EHR: Electronic Health Record.

Table 6

Results of association rules among keywords extracted from articles published in the five journals


EHR: Electronic Health Record, NLP: natural language processing, PHR: personal health record, HIT: health information technology.

Table 7

Results of association rules among extracted keywords from Healthcare Informatics Research

Share :
Facebook Twitter Linked In Google+ Line it
METRICS Graph View
  • 3 Crossref
  • 3   Scopus
  • 251 View
  • 1 Download
Related articles in Healthc Inform Res


Browse all articles >

Editorial Office
1618 Kyungheegung Achim Bldg 3, 34, Sajik-ro 8-gil, Jongno-gu, Seoul 03174, Korea
Tel: +82-2-733-7637, +82-2-734-7637    E-mail:                

Copyright © 2020 by Korean Society of Medical Informatics. All rights reserved.

Developed in M2community

Close layer
prev next