Cancer-related Keywords in 2023: Insights from Text Mining of a Major Consumer Portal

Article information

Healthc Inform Res. 2024;30(4):398-408
Publication date (electronic) : 2024 October 31
doi : https://doi.org/10.4258/hir.2024.30.4.398
Cancer Knowledge & Information Center, National Cancer Control Institute, National Cancer Center, Goyang, Korea
Corresponding Author: Jae Kwan Jun, Cancer Knowledge & Information Center, National Cancer Control Institute, National Cancer Center, 323 Ilsan-ro, Ilsandong-gu, Goyang 10408, Korea. Tel: +82-31-920-2184, E-mail: jkjun@ncc.re.kr (https://orcid.org/0000-0003-1647-0675)
*These authors contributed equally to this work.
Received 2024 May 13; Revised 2024 October 2; Accepted 2024 October 3.

Abstract

Objectives

With the growing importance of monitoring cancer patients’ internet usage, there is an increasing need for technology that expands access to relevant information through text mining. This study analyzed internet articles from portal sites in 2023 to identify trends in the information available to cancer patients and to derive meaningful insights.

Methods

This study analyzed 19,578 news articles published on Naver, a major Korean portal site, from January 1, 2023, to December 31, 2023. Natural language processing, text mining, network analysis, and word cloud analysis were employed. The search term “am” (Korean for “cancer”) was used to identify keywords related to cancer.

Results

In 2023, an average of 1,631 cancer-related articles were published monthly, with a peak of 1,946 in September and a low of 1,371 in February. A total of 132,456 keywords were extracted, with “cure” (2,218 occurrences), “lung cancer” (1,652), and “breast cancer” (1,235) being the most frequent. Term frequency-inverse document frequency analysis ranked “struggle” (1064.172) as the most significant keyword, followed by “lung cancer” (839.988) and “breast cancer” (744.840). Network analysis revealed four distinct clusters focusing on treatment, celebrity-related issues, major cancer types, and cancer-causing factors.

Conclusions

The analysis of cancer-related keywords in 2023 indicates that news articles often prioritize gossip over essential information. These findings provide foundational data for future policy directions and strategies to address misinformation. This study underscores the importance of understanding the nature of cancer-related information consumed by the public and offers insights to guide official policies and healthcare practices.

I. Introduction

The significance of cancer information is widely acknowledged, particularly in light of the changing prevalence of the disease worldwide and its profound impact on families, economies, and societies [1]. With the rise of the internet, particularly social media platforms, cancer patients are increasingly turning to these channels to share their experiences, connect with support networks, and exchange information related to cancer. To alleviate the socio-economic burden on cancer patients, it is crucial to provide accurate and essential health information [2].

Cancer patients and their families actively seek medical information through various channels, leading to exposure to a wide range of sources. However, the spread of misinformation—incorrect or misleading information presented as fact—presents significant challenges for these individuals. A prominent example is the fenbendazole case in Korea [3]. Originally developed as an anthelmintic for dogs, fenbendazole became controversial due to the proliferation of false claims on social media about its supposed efficacy in curing cancer when ingested [4].

As the importance of monitoring the information that cancer patients access on the internet continues to grow, there is a corresponding need to develop technology that can enhance the accessibility and usefulness of information found in published literature through text mining [5]. Text mining involves automatically extracting information from various written resources and transforming unstructured text into a structured format to identify meaningful patterns and uncover new insights [6]. This technique has emerged as a potential solution for bridging the gap between free-text and structured representation of cancer information [7]. It enables the extraction of valuable information and knowledge from extensive textual data and is now widely applied in biomedical research [8]. Some studies have employed text-mining technology to uncover new insights, thereby contributing to advancements in biomedical research, particularly in the field of malignant diseases such as cancer [8].

An accurate understanding of the latest trends in cancer-related information consumption is crucial. In this context, “cancer-related information” refers to data spanning the entire cancer control continuum, which influences aspects of cancer prevention, screening, diagnosis, treatment, and survivorship [9]. Topic modeling has facilitated the identification of keywords in cancer-related information accessed by individuals, providing a comprehensive analysis and visualization of cancer-related messages [10]. As a widely utilized statistical methodology, topic modeling examines the words within original texts to uncover hidden themes or topics, and explores how each topic is interconnected and evolves over time [11]. This technique offers the advantage of producing objective and clear analytical results through the statistical analysis of research topics.

As online news consumption continues to grow, a variety of platforms for accessing news content have emerged, with portal sites being a notable example. Portal-based news acts as relatively unbiased aggregators, offering a broad selection of articles from various media outlets [12]. Additionally, online news is typically available free of charge, which enhances public accessibility. According to the Reuters Institute for the Study of Journalism, South Korea has the highest reliance on search engines, such as portal sites, for digital news consumption among 46 surveyed countries [13]. Specifically, 72% of Korean users reported using search engines as their primary source for online news, a proportion that is twice the average across the surveyed countries [13]. As the incidence of cancer increases among older adults, so does interest in cancer-related information, with portal-based news articles being the most widely consumed source for such information.

Therefore, this exploratory study aimed to collect and analyze internet articles posted on a portal site throughout 2023. The goal was to identify trends in the information available to cancer patients and to derive meaningful implications. The study focused on identifying keywords in cancer-related articles from 2023 to understand the types of information that cancer patients encountered during this period. The analysis provides a comprehensive analysis of cancer-related information consumption trends in South Korea in 2023, highlighting unique aspects such as the impact of portal-based news on public understanding and the role of text mining in uncovering insights not addressed in previous studies. By doing so, the study provides a basis for providing targeted information tailored to the specific informational needs of cancer patients.

II. Methods

1. Study Design and Data Collection

This exploratory study aimed to identify and evaluate the types of cancer-related information that the public encounters and consumes. Social media has significantly transformed how news is produced and consumed, influencing the public’s interpretation of various issues [14]. To examine trends in cancer-related news, we collected the titles of news articles published between January 1, 2023, and December 31, 2023, from Naver, a leading Korean portal site. For text analysis, we selected articles from Naver’s news section using the search term “am” (Korean for “cancer”). A total of 19,578 news articles were gathered and organized chronologically to ensure comprehensive monthly coverage. The text was then segmented into Korean word units for further analysis. All data were analyzed and visualized using Python 3.11.4 (Python Software Foundation, Wilmington, DE, USA)

This study was waived by the Institutional Review Board because it utilized online article data.

2. Natural Language Processing in Text Mining

Text mining and natural language processing (NLP) have received extensive attention for their advanced capabilities in managing and analyzing text-based information [15]. Considering that text is the predominant data type in all stages of data construction management, with over 80% of data being unstructured, it is crucial to effectively retrieve specific textual information from documents [15]. Moreover, NLP includes techniques such as morpheme analysis, and word and sentence generation, which are essential for text mining applications. Once relevant text documents are retrieved, the character strings must be processed to enable computer analysis. Therefore, the input must be specifically formatted to allow computers to understand natural language in the same way humans do [16]. NLP utilizes a range of linguistically inspired techniques, including syntactic parsing with formal grammar and lexicons, which aid in the semantic interpretation of textual data [17].

3. Data Analysis

1) Data preprocessing

In the data preprocessing phase, article titles were retrieved using the BeautifulSoup and Pandas libraries (version 2.1.4). Special characters, except for Korean, numbers, and English, were removed using regular expressions. Unnecessary spaces were also eliminated, resulting in a clean corpus that enhanced data quality and facilitated subsequent text analysis. Nouns were extracted from the corpus using the Mecab module from the KoNLPy library (version 0.6.0). To concentrate on meaningful terms, single-character nouns were excluded, and noun frequencies were calculated using the Counter object.

To address the issue of out-of-vocabulary (OOV) words that were not captured by the Okt module, we employed the LRNounExtractor_v2 algorithm from the Soynlp library (version 0.0.493). Proper management of OOV words is crucial because their omission can significantly impact the performance of NLP models [18]. The LRNounExtractor_v2 algorithm identifies noun candidates from large corpora using unsupervised learning and calculates a reliability score based on word frequency and contextual information.

2) Frequency analysis

The primary objective of this study was to identify prominent keywords for each month, as well as for the entire year of 2023 (Figure 1). Text mining, a technique that transforms unstructured text data into a structured format, was employed to analyze hidden patterns and relationships, thereby extracting meaningful insights.

Figure 1

Number of relevant news articles published each month.

This study utilized term frequency (TF) and term frequency-inverse document frequency (TF-IDF) analyses to identify keywords from cancer-related articles following text preprocessing. The Counter function from the Collections library (version 2.1.1) was used to compute TF values, and the top 100 high-frequency keywords were selected for further analysis. The data were then transformed into a data frame, and visual word clouds were created using an online tool (https://www.wordclouds.com) to emphasize prominent cancer-related terms for each month (Figure 2).

Figure 2

Results of keyword extraction by frequency. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.

TF-IDF values were calculated for the top 100 TF-based keywords from the news title dataset. TF-IDF, a common tool in morphological analysis, evaluates the importance of specific terms by integrating a two-dimensional TF matrix with a scalar IDF value [19]. Words that appear frequently in a single document or a small group of documents typically achieve higher TF-IDF scores. It is crucial to recognize that while TF-IDF considers word frequency, it does not incorporate regularization [19]. The TfidfVectorizer class from scikit-learn (version 1.5.2) was utilized in Google Colab to compute the TF-IDF values, which were then stored in a sparse matrix format. This matrix was aggregated by column to assess the overall significance of each word across the dataset.

3) Network analysis

Network analysis is a set of techniques used to visualize relationships among actors and analyze the social structures that emerge from these interactions. From the perspective of network analysis, the relationships between variables contribute to the formation of underlying phenomena [20]. In this study, the top 50 nouns were selected to examine and visualize the relationships between keywords, as shown in Figures 3 and 4. The analysis was enhanced by incorporating missing OOV words using the LRNounExtractor_v2 algorithm. Only nouns that appeared at least 15 times and had a reliability score of 0.5 or higher were considered key terms.

Figure 3

Results of network analysis utilizing the top 50 keywords based on term frequency. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.

Figure 4

Results of network analysis utilizing the top 50 keywords based on keyword importance. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.

An undirected, weighted graph G = (V, E) was constructed using the networkx library (version 3.1). In this graph, nodes (V) represent individual keywords, and edges (E) represent co-occurrences, indicating that two keywords appeared together within the same article title. The weight of the edges was determined by the frequency of co-occurrence, providing an intuitive representation of the relationship between keywords.

Keyword clusters were identified using the Louvain algorithm from the community module (version 0.16), which detects communities by optimizing modularity for efficient clustering [21]. The weight and length of edges were inversely related; higher weights corresponded to shorter edge lengths, indicating stronger relationships between keywords. The network structure was visualized using the Spring layout algorithm, which arranges nodes based on the physical forces acting between them. Each cluster was visually distinguished by assigning distinct colors to the nodes of each community detected by the Louvain algorithm, facilitating clear differentiation between keyword clusters.

III. Results

Frequency analysis quantifies the number of cancer-related articles published on the portal throughout 2023. A higher frequency indicates a greater number of articles addressing cancer during specific periods, reflecting heightened attention to particular issues. In total, there were 19,578 news articles containing the keyword “cancer” (“am” in Korean). A monthly breakdown showed an average of 1,631 cancer-related articles per month (Figure 1), with the highest frequency in September (1,946 articles) and the lowest in February (1,371 articles).

In 2023, a total of 132,456 keywords were identified across all cancer-related news articles. Table 1 lists the top 20 most frequently occurring keywords, with the original Korean terms translated into English. The most common keywords included “cure,” “struggle,” “patients,” “lung cancer,” “antitumor,” “hospital,” “breast cancer,” and “pediatric cancer.” Notably, “cure” appeared 2,218 times, “struggle” 1,844 times, and “patients” 1,777 times. Among the types of cancer, “lung cancer” was mentioned 1,652 times and “breast cancer” 1,235 times, making them the most frequently discussed. The TF-IDF analysis assigned the highest importance score to “struggle” (1064.172), followed by “lung cancer” (839.988) and “breast cancer” (744.840). While there was a slight difference in the ranking of terms between TF and TF-IDF, both analyses consistently emphasized these key terms.

Top 20 keywords by frequency

Figure 2 visualizes the top 100 keywords using a word cloud representation. Table 2 displays the monthly frequency of the top 20 keywords, highlighting not only the major cancer-related topics for 2023 but also the dominant terms for each specific month. All keywords have been translated from Korean into English to enhance clarity.

Monthly frequency results of the top 20 keywords

Network analysis of the top 50 keywords, based on term frequency, identified clusters of related terms depicted in distinct colors; proximity within the figure indicates the degree of relevance (Figure 3). We identified four distinct clusters, each centered on different themes: treatment-related discussions including new drug development, celebrity-related issues, major cancer concerns, and factors contributing to cancer such as carcinogenesis. Keywords like a celebrity’s name, “donation,” “carcinogen,” and “vaccine” served as hubs, demonstrating strong direct connections to other nodes. A similar network analysis, focusing on keyword importance, is shown in Figure 4, with a comparable classification.

IV. Discussion

Accurate and reliable information about cancer is crucial for patients to manage their condition effectively [3]. For cancer communication to effectively disseminate information, it is essential to understand the context in which this information is obtained. Studies have indicated that health information on social media often lacks quality and can be biased, potentially leading to harmful consequences for users [22]. Monitoring the dissemination of online information and reviewing related research are crucial steps in addressing this issue. Therefore, this study aims to collect and analyze internet news articles posted on major portal sites in South Korea throughout 2023, to identify the cancer-related information accessible to and consumed by cancer patients. By examining the information that has been consumed, this study seeks to establish a foundation for determining the information that is still needed.

Based on our results, the majority of the top-linked and exposed keywords were related to common cancers such as lung and breast cancer. This suggests that most articles focus on common cancers, indicating a lack of information on rare cancers despite the demand for them. This implies that articles aimed at capturing attention based on public interest and importance, rather than reflecting the true demand and facts for rare cancers, are rapidly circulating [23]. This trend could potentially exacerbate the information gap regarding rare cancers, leading to discrepancies in the volume, accuracy, and relevance of the information provided [24]. Furthermore, our network analysis revealed that when related keywords were connected, articles featuring celebrity gossip were more prevalent than those providing factual information. This underscores a significant limitation in the dissemination of information via internet articles

According to our findings, another significant keyword for 2023 was “childhood cancer.” This term was frequently associated with content that focused on celebrities’ donations to childhood cancer patients, highlighting public interest in such philanthropic acts. Additionally, there have been numerous discussions aimed at improving the medical system for children, particularly due to concerns about the shortage of dedicated personnel for childhood cancer. Despite the well-developed childhood cancer treatment environment in South Korea, the provinces face a significant lack of dedicated treatment facilities. Efforts are underway to address this issue, including proposals to establish a pediatric cancer base hospital in the region to facilitate the efficient formation of a pediatric cancer treatment team [25]. Thus, articles addressing these issues dominated the related content landscape.

News articles often feature content that is easily accessible and gossip-oriented, which differs from the information sought by the general public, including cancer patients. This discrepancy is also reflected in the deviation from keywords commonly used in online cafés frequented by cancer patients. This shift can be attributed to news outlets no longer merely delivering information, but rather engaging in the creation and dissemination of content to garner wider interest across various online platforms [26]. By examining the results of the network analysis, it becomes clear that when each node is clustered, the network centers around interest-inducing keywords such as new drug announcements and celebrity content. In other words, many articles received more clicks for their entertainment value than for the informative content they provided. The abundance of related articles indicates a strong public interest in these topics. However, mere interest does not guarantee accurate information, and caution is needed.

Additionally, the results reflect a substantial public interest in cancer-related keywords, particularly those that became significant issues in South Korea in 2023. Lung cancer has received heightened attention due to various concerns, including the health risks associated with humidifier disinfectants and the incidence of lung cancer among school cafeteria workers. Humidifier disinfectants, widely used in South Korean homes to inhibit microbial growth in humidifier tanks, have become controversial after studies showed that inhaling these chemicals could cause severe lung damage [27]. In 2023, public concern escalated when a potential link between these disinfectants and lung cancer was officially recognized. Additionally, exposure to cooking oil fumes generated from frying at high temperatures has been linked to lung cancer, highlighting occupational health risks. The issue of occupational lung cancer among school cafeteria workers has also gained considerable attention in South Korea [28]. Thus, lung cancer-related issues were prominent throughout 2023.

The frequent mention of certain cancer-related keywords in news articles on portal sites can often be linked to sociocultural factors, such as the well-publicized cancer struggles of celebrities and the recent discovery of carcinogens. In South Korean society, the public’s fascination with celebrities significantly influences their attitudes and behaviors, as people often experience a sense of connection and belonging through their perceived relationships with these public figures [29]. Moreover, the heightened exposure of socioeconomically disadvantaged groups to environmental carcinogens further increases the visibility of these issues [30]. The extensive media coverage of these topics indicates a growing public concern across different demographic groups. Analyzing the prevalence of these keywords offers valuable insights into the types of information that capture public attention, underscoring the urgent need for accurate and reliable health information dissemination to ensure effective public health communication.

This study has several limitations that highlight potential areas for further research. First, the data collection was confined to news articles from specific internet portals. However, by focusing on Naver, the leading portal site in Korea, we ensured broad coverage of major national issues. Additionally, this study was limited to news articles, which may have restricted the diversity of information sources. Future studies could broaden the scope by incorporating content from a wider range of platforms. It is important to note, however, that many online platforms, such as internet cafés, contain personal information, which could compromise data integrity. Therefore, the focus on news articles, which typically provide more objective cancer-related information, facilitates the extraction of valuable insights. Lastly, this research is limited to articles published in 2023. While some details may vary in subsequent years, the general trends identified are expected to remain relevant. Thus, this study provides important insights into cancer-related information consumption and serves as a foundation for future inquiries in this area.

This study identified patterns in the consumption of cancer-related information and highlighted topics of public interest through keyword analysis in 2023. The findings from this text mining analysis provide essential foundational data that can inform future policy directions and strategies, enabling a more proactive response to misinformation. The use of network analysis facilitated the identification of associations between keywords. Further research should focus on monitoring both emerging keywords and those frequently used in cancer-related content. Ultimately, this study underscores the importance of understanding the nature of cancer-related information consumed by the public and offers valuable insights that can guide official policies and healthcare practices.

Notes

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Acknowledgments

This work was supported by the National Cancer Center Grant (No. 2410580-1). The funding sources did not have interventions such as study design and data interpretation.

References

1. Khoshnood Z, Dehghan M, Iranmanesh S, Rayyani M. Informational needs of patients with cancer: a qualitative content analysis. Asian Pac J Cancer Prev 2019;20(2):557–62. https://doi.org/10.31557/APJCP.2019.20.2.557.
2. Gage-Bouchard EA, LaValley S, Warunek M, Beaupin LK, Mollica M. Is cancer information exchanged on social media scientifically accurate? J Cancer Educ 2018;33(6):1328–32. https://doi.org/10.1007/s13187-017-1254-z.
3. Kim JH, Oh KH, Shin HY, Jun JK. How cancer patients get fake cancer information: from TV to YouTube, a qualitative study focusing on fenbendazole scandle. Front Oncol 2022;12:942045. https://doi.org/10.3389/fonc.2022.942045.
4. Yoon HY, You KH, Kwon JH, Kim JS, Rha SY, Chang YJ, et al. Understanding the social mechanism of cancer misinformation spread on YouTube and lessons learned: infodemiological study. J Med Internet Res 2022;24(11):e39571. https://doi.org/10.2196/39571.
5. Korhonen A, Seaghdha DO, Silins I, Sun L, Hogberg J, Stenius U. Text mining for literature review and knowledge discovery in cancer risk assessment and research. PLoS One 2012;7(4):e33427. https://doi.org/10.1371/journal.pone.0033427.
6. Gaikwad SV, Chaugule A, Patil P. Text mining methods and techniques. Int J Comput Appl 2014;85(17):42–5.
7. Spasic I, Livsey J, Keane JA, Nenadic G. Text mining of cancer-related information: review of current status and future directions. Int J Med Inform 2014;83(9):605–23. https://doi.org/10.1016/j.ijmedinf.2014.06.009.
8. Zhu F, Patumcharoenpol P, Zhang C, Yang Y, Chan J, Meechai A, et al. Biomedical text mining and its applications in cancer research. J Biomed Inform 2013;46(2):200–11. https://doi.org/10.1016/j.jbi.2012.10.007.
9. Johnson SB, Bylund CL. Identifying cancer treatment misinformation and strategies to mitigate its effects with improved radiation oncologist-patient communication. Pract Radiat Oncol 2023;13(4):282–5. https://doi.org/10.1016/j.prro.2023.01.007.
10. Chen L, Wang P, Ma X, Wang X. Cancer communication and user engagement on Chinese social media: content analysis and topic modeling study. J Med Internet Res 2021;23(11):e26310. https://doi.org/10.2196/26310.
11. Blei DM. Probabilistic topic models. Commun ACM 2012;55(4):77–84. https://doi.org/10.1145/2133806.2133826.
12. Choi DO. Internet portal competition and economic incentive to tailor news slant [Internet] Seoul, Korea: Korea Development Institute; 2017. [cited at 2024 Oct 1]. Available from: https://www.kdi.re.kr/research/reportView?&pub_no=15184.
13. SO Oh, Park A, Choi JH. Digital news report in Korea 2021 [Internet] Seoul, Korea: Korea Press Foundation; 2021. [cited at 2024 Oct 1]. Available from: https://www.kpf.or.kr/front/research/selfDetail.do?seq=592216.
14. Park S, Bier LM, Park HW. The effects of infotainment on public reaction to North Korea using hybrid text mining: content analysis, machine learning-based sentiment analysis, and co-word analysis. Prof Inf 2021;30(3):e300306. https://doi.org/10.3145/epi.2021.may.06.
15. Shamshiri A, Ryu KR, Park JY. Text mining and natural language processing in construction. Autom Constr 2024;158:105200. https://doi.org/10.1016/j.autcon.2023.105200.
16. Zanini N, Dhawan V. Text mining: an introduction to theory and some applications. Res Matters 2015;(19):38–44. https://doi.org/10.17863/CAM.100316.
17. Kao A, Poteet S. Text mining and natural language processing: introduction for the special issue. ACM SIGKDD Explor Newsl 2005;7(1):1–2. https://doi.org/10.1145/1089815.1089816.
18. Lochter JV, Silva RM, Almeida TA. Deep learning models for representing out-of-vocabulary words. In : Cerri R, Prati RC, eds. Intelligent systems Cham, Switzerland: Springer; 2020. p. 418–34. https://doi.org/10.1007/978-3-030-61377-8_29.
19. Park JY, Lee J, Hong B. Keyword network analysis of infusion nursing from posts on the Q&A board in the Intravenous Nurses Café. Healthc Inform Res 2023;29(1):75–83. https://doi.org/10.4258/hir.2023.29.1.75.
20. Hevey D. Network analysis: a brief overview and tutorial. Health Psychol Behav Med 2018;6(1):301–28. https://doi.org/10.1080/21642850.2018.1521283.
21. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech 2008;2008(10):P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008.
22. Loeb S, Sengupta S, Butaney M, Macaluso JN Jr, Czarniecki SW, Robbins R, et al. Dissemination of misinformative and biased information about prostate cancer on YouTube. Eur Urol 2019;75(4):564–7. https://doi.org/10.1016/j.eururo.2018.10.056.
23. Shin HS, Lee YJ. Journalists’ awareness of misinformtaion issues: focused on in-depth interviews. Korean J Journal Commun Stud 2021;65(4):239–72.
24. Desplenter FA, Laekeman GJ, De Coster S, Simoens SR, ; VZA Psychiatry Research Group. Information on antidepressants for psychiatric inpatients: the divide between patient needs and professional practice. Pharm Pract (Granada) 2013;11(2):81–9. https://doi.org/10.4321/s1886-36552013000200004.
25. Ministry of Health and Welfare. Develop a plan to establish a pediatric cancer treatment system ensuring access to treatment for pediatric cancer patients at hospitals near their residence [Internet] Sejong, Korea: Ministry of Health and Welfare; 2023. [cited at 2024 Oct 1]. Available from: https://www.mohw.go.kr/board.es?mid=a10503010100&bid=0027&act=view&list_no=377367.
26. Im YH, Kim E, Kim KH, Kim A. News perceptions and uses among online-news users. Korean J Journal Commun Stud 2008;52(4):179–204.
27. Hong M, Ju MJ, Yoon J, Lee W, Lee S, Jo EK, et al. Exposures to humidifier disinfectant and various health conditions in Korean based on personal exposure assessment data of claimants for compensation. BMC Public Health 2023;23(1):1800. https://doi.org/10.1186/s12889-023-16389-x.
28. Kim M, Kim Y, Kim AR, Kwon WJ, Lim S, Kim W, et al. Cooking oil fume exposure and Lung-RADS distribution among school cafeteria workers of South Korea. Ann Occup Environ Med 2024;36:e2. https://doi.org/10.35371/aoem.2024.36.e2.
29. Lee S, Jeong EL. An integrative approach to examining the celebrity endorsement process in shaping affective destination image: a K-pop culture perspectives. Tour Manag Perspect 2023. Sep. 1. 48101150. https://doi.org/10.1016/j.tmp.2023.101150.
30. Larsen K, Rydz E, Peters CE. Inequalities in environmental cancer risk and carcinogen exposures: a scoping review. Int J Environ Res Public Health 2023;20(9):5718. https://doi.org/10.3390/ijerph20095718.

Article information Continued

Figure 1

Number of relevant news articles published each month.

Figure 2

Results of keyword extraction by frequency. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.

Figure 3

Results of network analysis utilizing the top 50 keywords based on term frequency. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.

Figure 4

Results of network analysis utilizing the top 50 keywords based on keyword importance. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.

Table 1

Top 20 keywords by frequency

Rank TF TF-IDF


Keyword Frequency Keyword Importance
1 Cure 2,218 Struggle 1064.172

2 Struggle 1,844 Lung cancer 839.988

3 Patients 1,777 Breast cancer 744.840

4 Lung cancer 1,652 Cure 644.143

5 Antitumor 1,308 Pediatric cancer 639.642

6 Hospital 1,305 Patients 631.763

7 Breast cancer 1,235 Cancer-fight 543.573

8 Pediatric cancer 1,153 Colon cancer 479.899

9 Antitumor-agent 1,112 Antitumor-agent 445.275

10 Diagnosis 1,071 Develop 435.988

11 Surgery 963 Surgery 427.499

12 Develop 884 Diagnosis 422.857

13 Therapy drug 872 Therapy drug 373.875

14 Cancer-fight 833 Pancreatic cancer 372.349

15 Colon cancer 765 Antitumor 348.769

16 Substance 741 Death 346.532

17 Bio 740 Risk 335.604

18 New drug 698 Donation 330.072

19 Clinical 672 Liver cancer 321.263

20 Health 663 Gastric cancer 302.215

The results show a translation from the original Korean to English.

TF: term frequency, IDF: inverse document frequency.

Table 2

Monthly frequency results of the top 20 keywords

Rank Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 Antitumor Patients Cure Cure Struggle Cure Carcinogen Cure Lung cancer Breast cancer Struggle Struggle
2 Struggle Cure Lung cancer Patients Cure Lung cancer Aspartame Struggle Cure Cure Cure Lung cancer
3 Cure Diagnosis Cafeteria Struggle Patients Struggle Substance Yoon D Struggle Patients Patients Cure
4 Patients Struggle Patients Surgery Pediatric cancer Cancer-fight Patients Recovery Patients Lung cancer Breast cancer Patients
5 Pediatric cancer Antitumor Therapy drug Lung cancer Breast cancer Patients Potential Patients Hospital Antitumor Colon cancer Pediatric cancer
6 Breast cancer Lung cancer Antitumor Antitumor Hospital Hospital Lung cancer Diagnosis Pediatric cancer Hospital Antitumor Breast cancer
7 Cancer-fight Surgery Diagnosis Therapy drug Antitumor Breast cancer Hospital Lung cancer Pancreatic cancer Diagnosis Surgery Surgery
8 Diagnosis Develop Hospital Hospital Colon cancer Insurance Cure Antitumor Blood cancer Struggle Death Antitumor
9 Park S Breast cancer School Blood cancer Develop Death Pediatric cancer Develop Byun H Surgery Hospital Park S
10 Seo J Therapy drug Develop Develop Gastric cancer Surgery Colon cancer Pediatric cancer Breast cancer Liver cancer Lung cancer Hospital
11 Surgery Cancer-fight Breast cancer Clinical Kim W Antitumor Struggle Hospital Diagnosis Therapy drug Risk Diagnosis
12 Develop Health Struggle Pediatric cancer Blood cancer Substance Risk Colon cancer Pass away New drug Diagnosis Donation
13 Antitumor-agent Wife Research Health New drug Carcinogen Antitumor Cancer-fight Antitumor Clinical Pancreatic cancer Antitumor-agent
14 Hospital Hospital Prevention New drug Donation Therapy drug Breast cancer Confession Health Pancreatic cancer Pediatric cancer New drug
15 Pancreatic cancer Antitumor-agent Worker Bio Therapy drug Diagnosis Classification Breast cancer Antitumor-agent Health Therapy drug Therapy drug
16 Lung cancer Husband Screening Cancer-fight Surgery Pediatric cancer Diagnosis Choi P Yoon D Research Effect Death
17 Donation Clinical Clinical Announcement Nasopharyngeal cancer Clinical Develop Liver cancer Develop Pediatric cancer Cancer-fight Cancer-fight
18 Jeong M Blood Health Effect Lung cancer Potential New drug Substance World Antitumor-agent Antitumor-agent Colon cancer
19 Tongue cancer New drug Antitumor-agent Diagnosis Jeon Y Risk Liver cancer Month Actor Announcement Clinical Develop
20 General World Announcement Vaccine Diagnosis Antitumor-agent Bio Antitumor-agent Therapy drug Immunotherapy Oh C Survival

The results show a translation from the original Korean to English with personal names anonymized to ensure privacy.