Development and Validation of Adaptable Skin Cancer Classification System Using Dynamically Expandable Representation

Article information

Healthc Inform Res. 2024;30(2):140-146
Publication date (electronic) : 2024 April 30
doi :
Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea
Corresponding Author Yu Rang Park, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea. Tel: +82-2-2228-2493, E-mail: (
Received 2023 December 30; Revised 2024 April 24; Accepted 2024 April 24.



Skin cancer is a prevalent type of malignancy, necessitating efficient diagnostic tools. This study aimed to develop an automated skin lesion classification model using the dynamically expandable representation (DER) incremental learning algorithm. This algorithm adapts to new data and expands its classification capabilities, with the goal of creating a scalable and efficient system for diagnosing skin cancer.


The DER model with incremental learning was applied to the HAM10000 and ISIC 2019 datasets. Validation involved two steps: initially, training and evaluating the HAM10000 dataset against a fixed ResNet-50; subsequently, performing external validation of the trained model using the ISIC 2019 dataset. The model’s performance was assessed using precision, recall, the F1-score, and area under the precision-recall curve.


The developed skin lesion classification model demonstrated high accuracy and reliability across various types of skin lesions, achieving a weighted-average precision, recall, and F1-score of 0.918, 0.808, and 0.847, respectively. The model’s discrimination performance was reflected in an average area under the curve (AUC) value of 0.943. Further external validation with the ISIC 2019 dataset confirmed the model’s effectiveness, as shown by an AUC of 0.911.


This study presents an optimized skin lesion classification model based on the DER algorithm, which shows high performance in disease classification with the potential to expand its classification range. The model demonstrated robust results in external validation, indicating its adaptability to new disease classes.

I. Introduction

Skin cancer is one of the most common types of human malignancies. Its diagnosis typically involves visual examination, dermoscopy, and histopathological analysis. However, due to the high volume of daily examinations and the complexity of disease detection, delays in diagnosis can occur. Existing classification systems, developed in previous studies, consist of deep learning models trained under fixed conditions and are limited to classifying a small number of diseases. As a result, there is a growing demand for efficient and rapid diagnostic tools. Machine learning and deep learning models, particularly those analyzing medical images, have the potential to effectively meet this need [13]. To train deep learning models for accurate disease diagnosis and analysis based on medical images, a substantial amount of data is required. Additionally, as new diseases are discovered or existing diseases exhibit variations, hospitals encounter an increasingly diverse array of disease types. Consequently, classification models must be continually updated and expand their classification scope to remain relevant in clinical applications.

In the field of medical diagnostics, the development and implementation of computer-aided diagnosis (CAD) tools present significant challenges. One critical aspect is the need for incremental learning, especially when dealing with diverse pathologies and imaging machines. Creating a comprehensive dataset that includes various diseases and imaging modalities can be a time-consuming process, often taking several years to collect and annotate [4].

In this study, we propose a classification model that employs dynamically expandable representation (DER), a state-of-the-art incremental learning algorithm capable of scaling in the image classification domain [5]. The DER model addresses the challenges associated with the fine-grained variability in the appearance of skin cancers by continually learning and adapting to new data. It enables the efficient integration of new disease types into the classification framework, ensuring that the model remains up-to-date and capable of accurately classifying a wide range of skin cancers.

The objective of this study was to develop an automated skin lesion classification model using DER and to evaluate its performance. By leveraging the strengths of deep learning and incremental learning techniques, we aim to improve the efficiency and accuracy of skin cancer diagnosis. The model was trained on a large dataset of annotated skin lesion images, including diverse disease types, and its performance was evaluated using various metrics.

II. Methods

1. Data Construction

Two datasets were utilized in this study. The first, the HAM10000 (Human Against Machine with 10000 training images) dataset, included images categorized into distinct classes such as actinic keratoses, basal cell carcinoma, dermatofibroma, melanocytic nevi, melanoma, and vascular lesions [6]. The second dataset, International Skin Imaging Collaboration (ISIC) 2019, contained images classified into the same six categories as the first, with the addition of seborrheic keratosis and squamous cell carcinoma [7]. The datasets were divided into three subsets: training, validation, and test data, with the distribution set at 80%, 10%, and 10%, respectively. These datasets were used to train a model featuring an incremental learning structure. The effectiveness of the proposed method was subsequently evaluated.

2. Experimental Setting

All experimental settings were implemented using PyTorch 1.12.1. The models were trained on a machine equipped with four NVIDIA QUADRO P5000 GPUs, CUDA 11.2, 64 GB of memory, and an Intel Xeon Platinum 8253 CPU operating at 2.2 GHz.

3. Data Augmentation

To enhance the generalization capabilities of the convolutional neural network model, we employed extensive data augmentation techniques during the training phase. Initially, each skin image was resized to a dimension of 224 × 224 pixels to serve as the input for the model. We applied the following data augmentation steps:

  • • Horizontal flip: Each image was independently flipped horizontally with a probability of 0.5. This augmentation technique introduces variations in orientation, enhancing the model’s ability to process images with different spatial orientations.

  • • Random contrast and brightness change: Independent random adjustments were made to the contrast and brightness of each image. These adjustments were confined to a range of 20% in both the positive and negative directions and were applied with a probability of 0.5. This augmentation technique captures variations in illumination conditions within the dataset.

  • • Distortion: Distortion was applied to the images with a probability of 0.2. Various types of distortions, including optical, grid, and elastic distortion, were utilized. These techniques introduce geometric transformations to the images, simulating variations in image acquisition conditions and enhancing the model’s robustness to such variations.

These data augmentation techniques collectively contribute to expanding the diversity of the training dataset, enabling the model to learn more robust and generalized representations. By incorporating these variations into the training process, the model becomes better equipped to handle various image conditions and variations commonly encountered in real-world scenarios [8,9].

4. Incremental Learning

The DER algorithm for skin disease classification employs a multi-step approach, where each step involves training and validation using data specific to that step. The features learned in previous steps are combined with those from the current step to create a super feature representation. This incremental learning process enables the model to expand its capability to classify a broader range of classes. In each step of the incremental learning process, an additional training loss, referred to as the auxiliary loss, is calculated. The auxiliary loss helps classify the classes from both previous and current steps. By incorporating the auxiliary loss, the model learns different features that facilitate the expansion of the classifiable classes [5] (Figure 1). This incremental learning strategy is crucial for adapting the model to handle an increasing number of disease classes over successive steps. The model progressively accumulates knowledge from previous steps while incorporating new information from the current step. As a result, the model becomes more versatile in its ability to accurately classify skin lesions across a broader spectrum of classes. By employing incremental learning, the DER model not only maintains its performance on previously learned classes but also extends its capabilities to new classes. This approach ensures that the model can continuously adapt to the evolving landscape of skin lesion classification, incorporating new knowledge and improving its diagnostic accuracy.

Figure 1

Overview of the process of dynamically expandable representation for skin lesion classification.

5. Study Design

In this study, we developed and validated a skin cancer classification model using the HAM10000 dataset along with an external validation dataset from ISIC 2019. The HAM10000 dataset comprises images of various skin cancer types, including actinic keratoses, basal cell carcinoma, dermatofibroma, melanocytic nevi, melanoma, and vascular lesions. The model underwent training in a stepwise manner, with each phase focusing on two specific classes and assessing its performance thereafter. The training process was divided into three steps, allowing the model to progressively learn and classify different types of skin lesions. In the first step, the model classified actinic keratoses and basal cell carcinoma. In the second step, it focused on dermatofibroma and melanocytic nevi. The third step involved training on melanoma and vascular lesions. To strengthen the model’s classification capabilities, we introduced an additional incremental step that involved training on external data. This new step incorporated seborrheic keratosis and squamous cell carcinoma from the ISIC 2019 dataset, which were not included in the initial three-step training process. For validation, we utilized the HAM10000 dataset to assess the model’s performance in comparison to a ResNet-50-based neural network model from a previous study [10]. Subsequently, we applied incremental learning to integrate new classes from the ISIC 2019 dataset, followed by external validation using the same dataset.

This study was designed to develop a robust and versatile model for classifying skin lesions, capable of accurately identifying a wide range of lesion types across various datasets.

6. Model Evaluation

In this study, we used specific indicators to evaluate deep learning algorithms in medical image classification. We compared the performance of the DER-based model with a deep convolutional neural network, known as ResNet-50, using the same dataset. The classification performance for skin cancer was assessed using precision, recall, F1-score, and the area under the precision-recall curve [11]. The evaluation process involved calculating these metrics for both the training and validation datasets.

In addition to the performance evaluation metrics, this study also utilized gradient-weighted class activation mapping (Grad-CAM) to offer visual explanations for the skin lesions classification model [12]. Grad-CAM produces heat maps that emphasize the areas of the input images which are most influential in the model’s decision-making process. These heatmaps clearly illustrate the regions of interest and enhance the interpretability of the model’s classification decisions.

III. Results

In this study, we compared the performance of DER to that of ResNet-50 across various metrics using the HAM10000 dataset, which includes 1,015 images of skin cancer. The comparative analysis revealed that DER, an incremental learning algorithm, consistently outperformed the ResNet-50 model across all evaluated criteria (Table 1). The DER-based model achieved a classification accuracy of 80.88%, significantly exceeding the 73.55% accuracy of ResNet-50. Additionally, other performance metrics also demonstrated notable differences between the two models, particularly precision (weighted average precision: DER of 0.919, ResNet-50 of 0.799). Receiver operating characteristic (ROC) curve analysis further highlighted the model’s discrimination capability across different classification thresholds, with DER achieving an average area under the curve (AUC) value of 0.943 (Figure 2). The best-performing category was vascular lesions, with an AUC of 0.987, while the least effective was actinic keratoses, with an AUC of 0.888.

Performance comparison of dynamically expandable representation (DER) and ResNet-50 based skin cancer classification models: precision, recall, F1-score, and accuracy

Figure 2

Receiver operating characteristic curves and area under the curve (AUC) of the skin lesion classification model with the HAM10000 dataset.

To further validate the model’s performance, external validation was conducted using the ISIC 2019 dataset. ROC curve analysis demonstrated the model’s ability to discriminate across different classification thresholds, achieving an AUC value of 0.911 (Figure 3). Consistent with the HAM10000 results, vascular lesions were the best-performing skin condition in the ISIC 2019 dataset, with an AUC of 0.992, while the model performed worst for actinic keratosis, with an AUC of 0.815. The Grad-CAM results, which provided a visual representation of the skin lesion classification model, showed that the model clearly detected skin lesions (Figure 4), regardless of other image characteristics such as variations in original skin color or the presence of hair.

Figure 3

Receiver operating characteristic curves and area under the curve (AUC) of the skin lesion classification model for external validation

Figure 4

Visual explanations of the skin lesion classification model: comparison of original images (left) and gradient-weighted class activation mapping heatmap (right).

IV. Discussion

The primary contribution of this study is the development of an optimized skin cancer classification model based on the DER algorithm [5]. This model facilitates the progressive expansion of the classification range for skin lesions. It was trained and validated using the HAM10000 dataset [6], which includes six different skin diseases. We utilized the same ResNet-50 structured neural network to compare the classification accuracy with that of a convolutional neural network model developed by training on fixed data at once in a previous study [10]. The DER-based model was trained incrementally and achieved a classification accuracy of 80.88%, surpassing the 73.55% performance of the ResNet-50. In a previous study [13], skin cancer classification models using various neural networks were compared. The results showed the following accuracies: Xception at 78.45%, DenseNet-201 at 78.59%, InceptionResNet-V2 at 79.39%, GoogLeNet at 79.45%, and AlexNet at 79.65%. These figures represent lower performance compared to the 80.88% achieved by the DER model proposed in this study.

The reason for DER’s superior performance over traditional models lies in its ability to retain and utilize information from previous samples to predict subsequent ones incrementally [14,15]. In contrast, traditional models do not recycle information across datasets; instead, they depend solely on the data within each dataset. This approach makes them particularly vulnerable to issues like small sample sizes and non-independent identically distributed labels [16,17]. Through incremental learning, the model showed effectiveness in disease classification, as evidenced by an average AUC of 0.943 and a weighted average F1-score of 0.847. To further validate the model’s improved disease classification capabilities, external validation was performed using the ISIC 2019 dataset. During this phase, two new disease classes, seborrheic keratosis and squamous cell carcinoma, were incrementally trained and validated. The results from this external validation confirmed that the model successfully adapted to these new disease classes, achieving a competitive average AUC of 0.911. These findings underscore the model’s ability to expand its classification range and adapt to new disease types effectively. The incremental learning approach used in this study enabled the model to progressively accumulate knowledge from previous steps while integrating new information, thereby improving its ability to classify a diverse range of skin lesions. Additionally, the Grad-CAM images not only deepened our understanding of the model’s reasoning processes but also provided valuable insights for clinicians, enhancing their ability to assess the model’s reliability and verify its decision-making process [12].

We hypothesize that DER holds promise for fields that require scalability, particularly in medical screening and diagnostics. Dermatological conditions, including skin cancer, are likely to expand with the identification of new subtypes and broader diagnostic criteria. Given that the skin is one of the body’s largest organs, it provides a complex and diverse environment that is prone to disease manifestation [18]. The emergence of new environmental stressors, genetic variations, environmental pollutants, or changes in lifestyle habits may lead to the development of previously unknown skin diseases [19,20]. Moreover, advancements and refinements in medical technology offer opportunities to discover and accurately diagnose previously unrecognized dermatological conditions.

However, the developed model, while promising, has certain limitations and challenges that must be acknowledged. In this study, the use of imbalanced datasets for training led to variations in classification performance across different types of skin cancer. Further research is essential to determine the model’s generalizability to other datasets and its applicability in real-world clinical settings. Additionally, as new types of diseases emerge and existing ones evolve, continuous updates and adjustments to the model will be necessary.

In summary, this study implemented DER, an incremental learning method, to address the limitations associated with the resource-intensive retraining process observed in previous research [13,21]. This highlights the potential of the DER algorithm to enhance the classification of skin lesions and broaden the model’s ability to accurately identify various types of skin lesions. The results are significant for improving the efficiency and accuracy of skin cancer diagnoses, potentially aiding in early detection, timely treatment decisions, and better patient outcomes. The outcomes of this research could greatly influence the clinical setting by providing clinicians with a reliable and efficient tool for the automated classification of skin lesions. This could further facilitate early detection, timely diagnosis, and appropriate treatment decisions for patients with skin cancer.


Conflict of Interest

No potential conflict of interest relevant to this article was reported.


This work was supported by the Bio-Industrial Technology Development Program (No. 20014841) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea).


1. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017;542(7639):115–8.
2. Abedini M, Codella NC, Connell JH, Garnavi R, Merler M, Pankanti S, et al. A generalized framework for medical image classification and recognition. IBM J Res Dev 2015;59(2/3):1–18.
3. Haenssle HA, Fink C, Schneiderbauer R, Toberer F, Buhl T, Blum A, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol 2018;29(8):1836–42.
4. Kumar P, Srivastava MM. Example mining for incremental learning in medical imaging. In : Proceedings of 2018 IEEE Symposium Series on Computational Intelligence (SSCI); 2018 Nov 18–21; Bangalore, India. 48–51.
5. Yan S, Xie J, He X. DER: dynamically expandable representation for class incremental learning. In : Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021 Jan 20–25; Nashville, TN, USA. p. 3014–23.
6. Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data 2018;5:180161.
7. Codella NC, Gutman D, Celebi ME, Helba B, Marchetti MA, Dusza SW, et al. Skin lesion analysis toward melanoma detection: a challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). In : Proceedings of 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI); 2018 Apr 4–7; Washington, DC, USA. p. 168–72.
8. Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV. AutoAugment: learning augmentation strategies from data. In : Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15–20; Long Beach, CA, USA. p. 113–23.
9. Barata C, Celebi ME, Marques JS. Improving dermoscopy image classification using color constancy. IEEE J Biomed Health Inform 2015;19(3):1146–52.
10. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In : Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27 30; Las Vegas, NV, USA. p. 770–8.
11. Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med 2013;4(2):627–35.
12. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. In : Proceedings of the IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22–29; Venice, Italy. p. 618–26.
13. Popescu D, El-Khatib M, Ichim L. Skin lesion classification using collective intelligence of multiple neural networks. Sensors (Basel) 2022;22(12):4399.
14. Hasselmo ME. Avoiding catastrophic forgetting. Trends Cogn Sci 2017;21(6):407–8.
15. Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, et al. Overcoming catastrophic forgetting in neural networks. Proc Natl Acad Sci U S A 2017;114(13):3521–6.
16. Althnian A, AlSaeed D, Al-Baity H, Samha A, Dris AB, Alzakari N, et al. Impact of dataset size on classification performance: an empirical evaluation in the medical domain. Appl Sci 2021;11(2):796.
17. Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS One 2019;14(11):e0224365.
18. Lupton E. Skin: surface, substance, and design New York (NY): Princeton Architectural Press; 2007.
19. Chen X, Wen J, Wu W, Peng Q, Cui X, He L. A review of factors influencing sensitive skin: an emphasis on built environment characteristics. Front Public Health 2023;11:1269314.
20. Celebi Sozener Z, Ozdel Ozturk B, Cerci P, Turk M, Gorgulu Akin B, Akdis M, et al. Epithelial barrier hypothesis: effect of the external exposome on the microbiome and epithelial barriers in allergic disease. Allergy 2022;77(5):1418–49.
21. Popescu D, El-Khatib M, El-Khatib H, Ichim L. New trends in melanoma detection using neural networks: a systematic review. Sensors (Basel) 2022;22(2):496.

Article information Continued

Figure 1

Overview of the process of dynamically expandable representation for skin lesion classification.

Figure 2

Receiver operating characteristic curves and area under the curve (AUC) of the skin lesion classification model with the HAM10000 dataset.

Figure 3

Receiver operating characteristic curves and area under the curve (AUC) of the skin lesion classification model for external validation

Figure 4

Visual explanations of the skin lesion classification model: comparison of original images (left) and gradient-weighted class activation mapping heatmap (right).

Table 1

Performance comparison of dynamically expandable representation (DER) and ResNet-50 based skin cancer classification models: precision, recall, F1-score, and accuracy

DER ResNet-50

Precision Recall F1-score Precision Recall F1-score
Types of skin lesions (n = 1,015)

 Actinic keratosis (n = 30) 0.400 0.333 0.364 0.490 0.330 0.390

 Basal cell carcinoma (n = 35) 0.632 0.686 0.658 0.440 0.610 0.510

 Dermatofibroma (n = 8) 0.556 0.625 0.588 0.000 0.000 0.000

 Melanocytic nevi (n = 883) 0.993 0.826 0.902 0.850 0.940 0.890

 Melanoma (n = 46) 0.219 0.891 0.352 0.450 0.300 0.370

 Vascular lesions (n = 13) 0.545 0.923 0.686 0.760 0.570 0.650

Weighted average 0.919 0.809 0.848 0.799 0.869 0.828

Accuracy (%) 80.88 73.55