Breast cancer is the most common cancer diagnosed in women, and microcalcification (MCC) clusters act as an early indicator. Thus, the detection of MCCs plays an important role in diagnosing breast cancer.
This paper presents a methodology for mammogram preprocessing and MCC detection. The preprocessing method employs automatic artefact deletion and pectoral muscle removal based on region-growing segmentation and polynomial contour fitting. The MCC detection method uses a convolutional neural network for region-of-interest (ROI) classification, along with morphological operations and wavelet reconstruction to reduce false positives (FPs).
The methodology was evaluated using the mini-MIAS and UTP datasets in terms of segmentation accuracy in the preprocessing phase, as well as sensitivity and the mean FP rate per image in the MCC detection phase. With the mini-MIAS dataset, the proposed methods achieved accuracy scores of 99% for breast segmentation and 95% for pectoral segmentation, a sensitivity score of 82% for MCC detection, and an FP rate per image of 3.27. With the UTP dataset, the methods achieved accuracy scores of 97% for breast segmentation and 91% for pectoral segmentation, a sensitivity score of 78% for MCC detection, and an FP rate per image of 0.74.
The proposed preprocessing method outperformed the state-of-the-art methods for breast segmentation and achieved relatively good results for pectoral muscle removal. Furthermore, the MCC detection module achieved the highest test accuracy in identifying potential ROIs with MCCs compared to other methods.
Breast cancer is considered to be one of the most common global health problems [
Microcalcifications (MCCs) can be found in mammograms and act as an early indicator of up to 50% of all non-palpable breast cancers. They are also present in about 93% of all cases of ductal carcinoma in situ. Thus, the detection of MCC clusters is crucial in the diagnosis of breast cancer [
Mammogram preprocessing remains a challenging task due to the noise and artefacts present in images, the complex shapes of the pectoral muscle contour, the high density of some tissues in the breast, the appearance of auxiliary folds, and pectoral muscle superposition with fibroglandular tissue [
Preprocessing techniques have been proposed to increase the contrast between MCCs and other high-intensity regions [
Extensive research has focused on artefact removal and breast region segmentation. For example, Qayyum and Basit [
Others works centered on pectoral muscle segmentation. For example, Camilus et al. [
Several studies have focused on the detection and classification of MCCs using machine learning techniques. For example, El-Naqa et al. [
This paper presents a method for mammogram preprocessing and MCC detection. The preprocessing method employs an automatic artefact deletion and pectoral muscle removal algorithm based on the background estimation, homogeneity-based region-growing segmentation, and contour fitting. The MCC detection method employs a CNN-based approach for ROI classification, along with morphological operations and wavelet reconstruction for reducing the number of FP cases. The proposed methods were evaluated using the publicly available the mini-Mammographic Image Analysis Society (mini-MIAS) dataset and a private dataset (UTP dataset) in terms of segmentation accuracy in the preprocessing phase, as well as sensitivity and the mean FP rate per image in the MCC detection phase.
The proposed methodology for MCC detection was tested on two datasets; first, the publicly available mini-MIAS dataset contains 322 mammograms, 28 of which correspond to suspicious MCC cases as confirmed by biopsies (gold standard), with a size of 1024 × 1024 pixels. Second, the UTP dataset is part of the EJECALS dataset collected in 2014 by the Automatic Research Group at the Universidad Tecnológica de Pereira. This dataset contains 510 mammograms, 49 of which correspond to confirmed MCC cases; both datasets have craniocaudal (CC) and mediolateral oblique (MLO) views, with a size of 3560 × 4640 pixels.
Mammograms should be preprocessed before CAD algorithms can be applied to them for the tasks of classification and detection. The preprocessing steps include noise removal, radiopaque artefact suppression, and pectoral muscle removal. The pectoral muscle constitutes a predominant density region in the majority of MLO views of mammograms, which can affect the results of image processing, in terms of both the accuracy and the speed of analysis due to the significantly reduced area of the image to be examined.
Let
A four-step method is applied to remove the pectoral region. In the first step, the pectoral muscle is localized by dividing the preprocessed image
In the third step, segmentation of the pectoral muscle is performed using the region-growing algorithm with two parameters, namely the seed and threshold. The seed for all images is set to (10, 10) pixel coordinates. This ensures that the segmented region grows while covering the entire pectoral muscle. The optimal threshold
where μ
In the final step, pectoral muscle curve fitting is performed by obtaining the perimeter of
The perimeter of the pectoral muscle (
The detection of MCCs remains a major challenge due to the large variety of their widths, contrasts, lengths, and intersections, in addition to backgrounds with similar structures to that of MCCs and similar noise and signal levels. Furthermore, linear structures in mammograms such as ducts, blood vessels, and Cooper’s ligaments often produce a characteristically textured appearance that may contribute to a high level of FPs.
The proposed method for identifying MCCs in mammograms consists of two steps. The first step is the classification of ROIs with the aim of identifying MCCs, while the second step is MCC contrast enhancement for localization. In the first step, a CNN model was trained on a dataset manually constructed using the mini-MIAS and UTP datasets. The dataset included two classes, with “non-MCC” including normal ROIs (2,360 images) and “MCC” corresponding to MCC ROIs (1,932 images). Out of total 4,292 ROI images, 3,500 images (80%) were used for training the model and 792 images (20%) were used for its evaluation. The ROI window size was set to 101 × 101 pixels; this size is suitable for identifying MCCs without affecting their geometric distortion effects caused by pixelation. The CNN architecture comprised an input layer of size 101 × 101; seven hidden layers of sizes 3 × 3, with 8, 16, 32, 64, 128, 256, and 512 filters, respectively, each having a max-pooling layer of size 2 × 2 for extracting large-scale features; an activation layer with the rectified linear unit function for fast training; and an output layer with the softmax function for outputting the probabilistic class interpretation. The model was trained using the stochastic gradient descent algorithm with the momentum optimizer, an initial learning rate of 0.01, 10 epochs, and 25 iterations. The architecture parameters were selected experimentally. The model has the objective of classifying whether a specific ROI in a mammogram contains MCCs or not, in order to obtain a set of possible MCC candidates. To do so, the mammogram is divided into regions of size 101 × 101 pixels and the system evaluates each region automatically in the CNN classifier.
The second step is applied after obtaining ROIs containing MCCs. The basic idea is to enhance the contrast of MCCs to improve the accuracy of segmentation. To achieve this, each ROI is processed in three stages. The background is removed in the first stage, irrelevant structures are removed in the second stage, and MCCs are identified in the third stage. First, a morphological operation is used to remove the background. Next, wavelet reconstruction from approximation coefficients is performed to enhance the appearance of MCCs. Finally, segmentation is implemented using the binarization method. For background removal, a morphological transformation opening ○ is applied to the original ROI (
The wavelet (
The assumption that the diameter of MCCs varies from 0.1 mm to 1 mm [
where
The proposed algorithm for noise removal and radiopaque artefact suppression was applied to the CC and MLO views of mini-MIAS and UTP datasets, using a total of 832 mammograms. Segmentation accuracy was obtained by visual inspection; it was categorized as “successful” if the mammogram did not contain artefacts after the algorithm was applied, and “unacceptable” if artefacts remained. The success rate was 99.69% and 97.25% with the mini-MIAS and UTP datasets, respectively (
The proposed algorithm for pectoral muscle removal was applied to MLO mammogram views. Segmented accuracy was obtained by visual inspection; it was categorized as “successful” if the pectoral muscle was considered to be well-segmented, and “unacceptable” otherwise. Furthermore, the mean rates of FPs and FNs were calculated for each dataset. The success rate was 91.92% and 95.12% for the mini-MIAS and UTP datasets, respectively. The unacceptable cases, as well as FPs and FNs, were due to an insignificant difference between the pixel intensity values of the breast and pectoral muscle since the algorithm considers pectoral region homogeneity and requires a well-defined pectoral boundary in terms of pixel intensity values (
The accuracy score for the training set was 96.26%, while for the validation set (792 images), it was 95.83% (
This paper presents a method for MCC detection from raw mammograms. The proposed method includes two modules, one for raw mammogram preprocessing and another for MCC detection. In the first module, noise and artifacts are removed and segmentation of the breast region is achieved. In the segmentation and subsequent removal of the pectoral muscle, an iterative algorithm is implemented to search for the optimal threshold to be used in the region growing technique. This search is based on the homogeneity of the anatomical patterns of the image (pixel intensity) in the region to be segmented. Third-order polynomial adjustment is used to correct the pectoral contour. The second module conducts a search of MCC candidates in the entire image using a CNN trained to recognize MCCs. Irrelevant structures in the ROI background are removed using morphological operations, and the MCC intensity is enhanced using the wavelet reconstruction method. Finally, the number of FP cases is reduced based on geometrical MCC assumptions. The performance of the proposed methodology was evaluated on the mini-MIAS and UTP datasets. The results demonstrated that the proposed preprocessing module outperformed state-of-the-art methods for breast segmentation and achieved relatively good results for pectoral muscle removal. We believe that this is due to the employed strategy of performing an iterative search based on the homogeneity of the ROI with the aim of finding optimal parameters for the segmentation of each pectoral muscle. Furthermore, the filtering and removal of unwanted regions from mammograms in conjunction with the CNN’s pattern-recognition and abstraction properties allowed the MCC detection module to achieve the highest test accuracy in identifying potential ROIs with MCCs compared to other state-of-the-art methods.
This work makes two main contributions. The first contribution lies in the fact that the method we present is a beginning-to-end methodology; that is, we start from a raw image and deliver a mammogram in which the microcalcifications are detected and localized.
Our second contribution is way in which we segment the pectoral muscle. Specifically, the threshold with which the region growing algorithm operates is adjusted iteratively based on the fact that the pectoral region has a uniform intensity of the image pixels.
This work was developed under the framework of the research project “Prototipo de un Sistema de recuperacion de información por contenido orientado a la clasificación de grupos de microcalcificaciones en mamografias - protocam” funded by Vice-Rectory for Research of Universidad Tecnológica de Pereira.
No potential conflict of interest relevant to this article was reported.
Artefact removal steps. (A) Original image (
Steps 1 and 2 for pectoral muscle removal. (A) Artefact removal from the image (
Steps 3 and 4 for pectoral muscle suppression. (A) Region-growing segmentation after perimeter fitting (
Third-order polynomial pectoral contour fitting.
Microcalcification enhancement. (A) Original ROI
Microcalcification detection result.
Results of the proposed noise removal and radiopaque artefact suppression method compared to those of other methods reported in the literature (unit: %)
Study | Dataset | |
---|---|---|
UTP | Mini-MIAS | |
Proposed method | 97.25 | 99.69 |
Qayyum and Basit [ |
- | 99.37 |
Slavkovic-Ilic et al. [ |
- | 97.51 |
Yoon et al. [ |
- | 93.16 |
Results of the proposed pectoral muscle removal method compared to those of other methods reported in the literature
Study | Dataset |
Accuracy (%) | Other performance measures (%) |
---|---|---|---|
Our method | Mini-MIAS (282) | 91.92 | 7.01 (FP) – 11.34 (FN) |
UTP (159) | 95.12 | .70 (FP) – 14.95 (FN) | |
| |||
Shinde and Rao [ |
Mini-MIAS | 93.70 | Not mentioned |
| |||
Abdellatif et al. [ |
Mini-MIAS (80) | Not mentioned | 1.20 (FP) – 20.4 (FN) |
| |||
Qayyum and Basit [ |
Mini-MIAS | 93.00 | Not mentioned |
| |||
Slavkovic-Ilic et al. [ |
Mini-MIAS | 87.57 | Not mentioned |
| |||
Camilus et al. [ |
Mini-MIAS (84) | Not mentioned | 0.64 (FP) – 5.58 (FN) |
FP: false positive, FN: false negative.
The numbers in parenthesis indicate the number of subset images.
Confusion matrix results for region of interest classification
Predicted | Total | |||
---|---|---|---|---|
| ||||
Non-MCC | MCC | |||
Actual | Non-MCC | 590 (98) | 20 (2) | 610 |
MCC | 13 (7) | 169 (93) | 182 | |
| ||||
Total | 603 | 189 | 792 |
Values are presented as number (%).
MCC: microcalcification.
Sensitivity and FP rate per image for microcalcification enhancement and localization
Study | Dataset |
Sensitivity (%) | FP per image (%) |
---|---|---|---|
Proposed method | Mini-MIAS (260) | 78 | 0.28 |
UTP (140) | 82 | 2.33 | |
| |||
Wang and Yang [ |
Private (292) | 85 | 0.13 |
| |||
Wang et al. [ |
Private (292) | 90 | 0.24 |
| |||
Liu et al. [ |
Private (205) | 92 | 1.12 |
| |||
El-Neqa et al. [ |
Private (76) | 94 | 1.31 |
FP: false-positive.
The numbers in parenthesis indicate the number of subset images.