Knowledge Structure of Korean Medical Informatics: A Social Network Analysis of Articles in Journal and Proceedings

Article information

Healthc Inform Res. 2010;16(1):52-59

Publication date (electronic) : 2010 March 31

doi : https://doi.org/10.4258/hir.2010.16.1.52

Senator Jeong, PhD , Soo Kyoung Lee, RN , Hong-Gee Kim, PhD

Biomedical Knowledge Engineering Laboratory, Seoul National University, Seoul, Korea.

Corresponding Author: Hong-Gee Kim, PhD. Biomedical Knowledge Engineering Laboratory, Seoul National University, 28-22 Yeongeon-dong, Jongno-gu, Seoul 110-749, Korea. Tel: +82-2-740-8796, Fax: +82-2-743-8706, hgkim@snu.ac.kr

Received 2009 November 27; Accepted 2010 March 16.

Abstract

Objectives

This study aimed at exploring the knowledge structure of Korean medical informatics.

Methods

We utilized the keywords, as the main variables, of the research papers that were presented in the journal and symposia of the Korean Society of Medical Informatics, and we used, as cases, the English titles and abstracts of the papers (n = 915) published from 1995 through 2008. N-grams (bigram to 5-gram) were extracted from the corpora using the BiKE Text Analyzer, and their cooccurrence networks were generated via a cosine correlation coefficient, and then the networks were analyzed and visualized using Pajek.

Results

With the hub and authority measures, the most important research topics in Korean medical informatics were identified. Newly emerging topics by three-year period units were observed as research trends.

Conclusions

This study provides a systematic overview on the knowledge structure of Korean medical informatics.

Keywords: Medical Informatics; Knowledge Structure; Social Network Analysis; Co-word Analysis

I. Introduction

Characterizing a domain of study presupposes the understanding of its knowledge structure. Since its inception in 1987, there have been few efforts to understand Korean medical informatics in quantitative way. The purpose of this paper was to quantitatively analyze the knowledge structure of Korean medical informatics.

There are a number of qualitative studies on biomedical informatics [1-3]. These research efforts heavily relied on experts' knowledge, experience, and intuition. Thus they may lack objective and quantitative understanding of the structure of the field. Mainstream approaches exploring the knowledge structure of a domain are quantitative studies such as co-citation analysis (CCA) and co-word analysis (CWA) which gauge topical closeness, similarity in the level of authors, papers, journals, or disciplines.

The most extensively used method, CWA, can reveal the overall picture and broad landscape and boundaries of a given field. CWA uses patterns of co-occurrence of pairs of words or phrases in a corpus of texts to identify the relationships between ideas within the subject areas presented in texts [4]. Since their introduction [5], CWA techniques have been used to explore knowledge structure [6-8], analyze research trends [4,7,9-11], and generate hypothesis and discover knowledge [12-16]. In this study, we utilize CWA for exploring the knowledge structure of Korean medical informatics.

The following questions guided this research: 1) What are the important topics of Korean medical informatics? 2) What are the newly emerging research topics?

This paper begins with data and analysis methods to find the knowledge structure of Korean medical informatics. Then research findings such as important research topics and their contexts, and newly emerging topics are covered. Finally, we discuss the major findings and conclude the paper.

II. Methods

In this study, we adopted a well established co-word analysis protocol. It involves the following steps: 1) select the text corpus for the study; 2) extract and normalize the terms and get term weights; 3) get a term co-occurrence frequency matrix for the corpus; 4) get term-term relatedness; 5) analyze the term-term relatedness matrix, and visualize it. For this study BiKE Text Analyzer (BTA), a Java application was used.

1. Data Collection and Treatment

The time window for the target data was set as the 14 years from 1995 to 2008. We collected 1,075 papers' titles and abstracts published in the journal and symposia of Korean Society of Medical Informatics. Abstract-free papers were excluded from the corpus to have 915 for analysis. For consistency, Korean titles and abstracts of 295 papers was translated into English through Google Translator Toolkit. Then, the English terms were corrected, which have different meanings from original ("화상진료시스템" ==> "Burn care system" --> "telemedicine system"), were Romanized as pronounced ("기록지"==> "girokji" --> "record"), and have not been marked as medical terminology ("검사" ==> "inspection" --> "lab test").

In a co-word analysis, the critical step is to create a list of terms that constitute the variables for analysis. Our variables were created from a combination of the sources including: 1) symposium topic lists; 2) author keywords and biomedical informatics keywords from Thomson Web of Science; and 3) MeSH descriptors. Research topics were collected from call for paper topics and session titles in all symposia. The collected terms were appended to the Vocabulary Manager of BTA. The Vocabulary Manager automatically erases duplicate terms and manages n-grams (up to 5-grams), and it allows users to load new vocabularies and input new terms (Figure 1).

Figure 1

Analysis workflow and BiKE Text Analyzer.

2. Term Extraction and Normalization

Topics are difficult to represent with single words because they often have more than one meaning. In most cases, topics are appropriately described in multi-word phrases, which, especially in research domains, are much more interpretable [17]. In this study, we view a topic as a multi-word phrase rather than a single word. Before we proceeded to extract phrases and obtain precise variables, tokenized words' plural forms were singularized and their synonyms controlled. We adopted a less strict normalization strategy for words: tokenized words' plural forms were singularized (eg, records to record) and abbreviations were controlled with synonym lists (eg, HER = electronic health record). From paper corpora, we extracted n-gram terms (from 2-grams to 5-grams) as variables using BTA (total number of n-gram terms = 2,954). The most frequently occurring term was "information system" (term frequency = 533). After excluding the terms that occurred less than 5 times and inappropriate to be a variable (eg, "two type"), the term variables for analysis became 748.

3. Term Weight

Since not all terms may have the same importance in a document, the weight of each term was calculated by multiplying term frequencies (TF) by the inverse document frequency (IDF) for that term.

where f_i,j is the number of times the term i appears in the document j, Σ_kn_k,j is the total number of terms in the document d, N is the total number of documents, and n_i is the total number of documents containing the term i.

4. Term Co-occurrence and Closeness Matrix

The co-occurrence analysis approach quantifies term co-occurrences in documents. It assumes that the more frequently two terms appear together in the same document, the sooner they will be identified as being closely related [18]. BTA generates a term co-occurrence frequency matrix (748 × 748), and then transforms the matrix into a cosine correlation matrix, where each cell indicates the relative closeness of each term pair with a 0-1 range. The cosine measure is defined as the cosine of the angle enclosed between two term vectors x and y:

5. Converting Matrix to Network, Visualization, & Analysis

The term-term closeness matrix was converted to a social network showing the binary relationships between any two terms. This network provides a useful medium for representing the topical structure of Korean medical informatics in a concise and intuitive manner. Pajek software was used for network visualization and analysis. The node size equals to the logarithm of the term frequency, and the thickness of the lines indicates the cosine value (closeness) between a pair of terms.

III. Results

1. Top Research Topics

Some terms have links with many terms; their network of co-occurrences is quite extensive and occupies a central position in a field. To identify the important research topics in Korean medical informatics, authority and hub scores were calculated for each topic. In social network analysis, if a vertex points to many good authorities, it is a good hub. And if a vertex is pointed to by many good hubs, it is a good authority [19]. The authority scores and hub scores of topics are rendered as:

The authority score of a topic i (T_i) equals the sum of the hub scores of all topics (T_j) that point to it. The hub score (T_i) of a topic i (T_i) equals to the sum of the authority scores of all topics that it points to. Authority scores mutually reinforce hub scores. As shown in Table 1, we extracted the 50 most important topics in Korean medical informatics during the past 14 years (1995-2008). Table 2 shows that top 19 topics with high authority score occupy about 3.5% of 748 topics.

Table 1

Authority weights of top 50 important topics in Korean medical informatics

Table 2

Statistics of 748 research topics in Korean medical informatics

As shown in Figure 2, the 50 most important topics were grouped into 12 clusters: information system, decision support system, picture archiving and communication system, electronic health record, electronic medical record, XML, university hospital, healthcare, and so on. It is comparable to the findings of a study in global scale, in which physician order entry and practice guideline are one of the major topics as shown in Figure 3. It is also interesting that the information system group (upper right corner of Figure 2) is closely associated with nursing science topics such as nursing record, nursing process, nursing activity, nursing information, and so on.

Figure 2

Top 50 important topics of the Korean medical informatics. The edge values lower than cosine 0.15 in the original network were removed and clustered with component (tf ≥ 5; N = 50; cosine ≥ 0.15; k-component ≥ 1; component = 9). Nine components generate 12 groups. Contour lines were drawn by hand.

Figure 3

Top 100 important topics of medical informatics in global scale (tf ≥10; N=100; cosine ≥ 0.1; κ-component ≥1; component= 40). Adapted from [20].

2. Research Topic Trends

To investigate newly emerging topics, for each 3-year period (2 years for 2007 and 2008), we calculated term frequencies and identified the topics which represented the lowest 10% in the low frequency group in the preceding period(s), and which also remained in the highest 10% (5% in the years 2007-2008) in the high frequency group in the following periods.

During the past 14 years (1995-1998), information system, medical record, hospital information, management system, hospital information system, health information, web based, and information technology have been occupying top 5% of Korean medical informatics research topics. Some of the newly emerging research topics during the years 1998-2000 are nursing informatics and electronic medical record, during the years 2001-2003 are consumer health. Electronic health record system, information extraction, and ubiquitous healthcare are newly popular topics during 2004-2006. During 2007-2008 oriental nursing, bioinformatics, ubiquitous computing, personal health device are some of newly emerging topics (Figures 4-6).

Figure 4

Newly emerging research topics in Korean medical informatics during the years 1998-2000.

Figure 6

Newly emerging research topics in Korean medical informatics during the years 2007-2008.

IV. Discussion

The social network analysis of research topics communicated through the KOSMI journal and symposia provides a systematic overview on the knowledge structure of Korean medical informatics. From our analysis it is supposed that Korean medical informatics has been paying attention to the information artifacts such as EHR (EMR), CDSS, PACS and so on, but less to methodological topics (eg, machine learning, natural language processing, support vector machine) and their applications (eg, computerized physician order entry, clinical practice guideline) which are some of major topics in the global scale. Since the early 2000s, bioinformatics related topics (eg, expression data) have been emerging in the global scale [20], whereas in Korean only since the years of 2007-2008 (eg, bioinformatics data). These suggest that Korean medical informatics should be equipped with more rigorous methodologies and pay more attention to bioinformatics.

Several contributions of this study are notable. This study provides topic networks for systematic understanding of Korean medical informatics, and helps to gain a first insight into the main research interest in Korea. Our research trend analysis also helps to decide which technologies and themes should be included in medical informatics curriculum to meet ever-changing learners' needs. In addition, the methodology used for this study has implications in hypothesis generation and knowledge discovery which were demonstrated in many studies [21]. One may analyze, for example, the relationship between chief complaint and disease.

There are technical limitations to our study. Several advantage of using the N-grams as text analysis unit can also be viewed as disadvantage; N-gram may not catch important topics with single word topics (ie, ontology). In our study, however, even the combination of single-word and N-gram topics did not result in desirable product since they hardly bear context. Sooner they loosed the fine-grained spectacles of topics, and the highly pre-coordinated meaning of the topics. We do not regard topics covered in this study as wholly definitive topics of Korean medical informatics in Korea, nor does corpus used in this study encompass all research efforts in Korea. We would simply claim that the corpus represents those of studies and should suffice for our main purpose.

Further research may include comparative analysis of Chinese, Japanese and Korean medical informatics.

Acknowledgements

This paper is an extended version of our previous work [22]. This work was supported by the Korean Research Foundation Grant funded by the Korean Government (KRF-2008-562-D00035).

Notes

No potential conflict of interest relevant to this article was reported.

References

1. Hasman A, Haux R. Modeling in biomedical informatics: an exploratory analysis part 1. Methods Inf Med 2006;45:638–642. 17149505.

2. Hasman A, Haux R. Modeling in biomedical informatics: an exploratory analysis part 2. Int J Med Inform 2007;76:96–102. 17113824.

3. Maojo V, Kulikowski CA. Bioinformatics and medical informatics: collaboration on the road to genomic medicine? J Am Med Inform Assoc 2003;10:515–522. 12925552.

4. He Q. Knowledge discovery through co-word analysis. Lib Trends 1999;48:133–159.

5. Callon M, Courtial JP, Turner WA, Bauin S. From translations to problematic networks: an introduction to coword analysis. Soc Sci Inform 1983;22:191–235.

6. Morris TA. Structural relationships within medical informatics. Proc AMIA Symp 2000;:590–594. 11079952.

7. Bansard JY, Rebholz-Schuhmann D, Cameron G, Clark D, van Mulligen E, Beltrame E, Barbolla E, Martin-Sanchez Fdel H, Milanesi L, Tollis I, van der Lei J, Coatrieux JL. Medical informatics and bioinformatics: a bibliometric study. IEEE Trans Inf Technol Biomed 2007;11:237–243. 17521073.

8. Mane KK, Börner K. Mapping topics and topic bursts in PNAS. Proc Natl Acad Sci U S A 2004;101(Suppl 1):5287–5290. 14978278.

9. Garfield E. Mapping the world of biomedical engineering: Alza lecture (1985). Ann Biomed Eng 1986;14:97–108. 3740571.

10. Pickens J, MacFarlane A. Term context models for information retrieval. Proceedings of 15th ACM International Conference on Information and Knowledge Management 2006. Nov 5-11; Arlington, VA. p. 559–560.

11. Rebholz-Schuhman D, Cameron G, Clark D, van Mulligen E, Coatrieux JL, Del Hoyo Barbolla E, Martin-Sanchez F, Milanesi L, Porro I, Beltrame F, Tollis I, Van der Lei J. SYMBIOmatics: synergies in medical informatics and bioinformatics - exploring current scientific literature for emerging topics. BMC Bioinformatics 2007;8(Suppl 1):S18. 17430562.

12. Stegmann J, Grohmann G. Hypothesis generation guided by co-word clustering. Scientometrics 2003;56:111–135.

13. Swanson DR. Undiscovered public knowledge. Libr Q 1986;56:103–118.

14. Swanson DR. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med 1986;30:7–18. 3797213.

15. Stegmann J, Grohmann G. Transitive text mining for information extraction and hypothesis generation [Internet] 2005. cited 2008 Jul 10. Available from: http://arxiv.org/abs/cs/0509020.

16. Swanson DR. Migraine and magnesium: eleven neglected connections. Perspect Biol Med 1988;31:526–557. 3075738.

17. Mann GS, Mimno D, McCallum A. Bibliometric impact measures leveraging topic analysis. Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006. June 11-15; Chapel Hill, NC. p. 65–74.

18. Noyons E. Bibliometric mapping of science in a policy context. Scientometrics 2001;50:83–98.

19. Kleinberg JM. Authoritative sources in a hyperlinked environment. J ACM 1999;46:604–632.

20. Jeong S, Kim HG. Intellectual structure of biomedical informatics reflected in scholarly events. Scientometrics Epub 2010 Feb 11. DOI: 10.1007/s11192-010-0166-z.

21. Bekhuis T. Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy. Biomed Digit Libr 2006;3:2. 16584552.

22. Jeong S, Lee SK, Kim HG. Knowledge structure of Korean medical informatics. Proceedings of CJKMI Fall Conference 2009. Oct 30-31; Daejeon, KR. p. 49–51.

Article information Continued

(open-access, http://creativecommons.org/licenses/by-nc/3.0/) :

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.