Dao Ngoc Ton
Lecturer, Thai Nguyen University of Technology (TNUT), Thai Nguyen University, Thai Nguyen, Vietnam.
Abstract
Skin cancer, particularly melanoma, is among the malignancies for which early detection critically improves treatment outcomes. Although deep learning models have been widely applied to dermoscopic skin lesion classification, they still struggle under class imbalance, domain shift across data sources, and the need for reliable predictive probabilities. This paper evaluates a lightweight CNN–Transformer hybrid in which an EfficientNet-B0 backbone extracts local features and two Transformer encoder blocks model long-range contextual relations among lesion regions. The main contributions are: (i) a compact architecture (~7.5 M parameters) suitable for resource-constrained deployment; (ii) a composite loss combining cross-entropy, focal loss, and an entropy-based calibration regularization term; and (iii) a cross-dataset evaluation between HAM10000 and ISIC 2019 with reported statistical significance. Under stratified five-fold cross-validation on HAM10000, the proposed model attains Accuracy 91.8 ± 0.4 %, Balanced Accuracy 89.6 ± 0.5 %, Macro-F1 0.872 ± 0.006, and ECE 2.8 %, outperforming ConvNeXt-Tiny in Balanced Accuracy (89.6 % vs. 88.1 %, p < 0.05) and ECE (2.8 % vs. 3.9 %). When evaluated out-of-domain on ISIC 2019 without additional fine-tuning, the model achieves Accuracy 83.2 %, Balanced Accuracy 77.8 %, Macro-F1 0.759, and ECE 6.7 %. The results indicate that the proposed model maintains competitive classification performance and improves probability calibration under distribution shift, while a substantial generalization gap remains to be addressed.
Keywords: Skin lesion classification, CNN–Transformer hybrid, Calibration, Domain shift, HAM10000, ISIC 2019, Focal loss
References
- Argenziano, G., Soyer, H.P., Chimenti, S., et al. (2003) ‘Dermoscopy of pigmented skin lesions: Results of a consensus meeting via the internet’, Journal of the American Academy of Dermatology, 48(5), pp. 679–693. https://doi.org/10.1067/mjd.2003.281
- Codella, N., Rotemberg, V., Tschandl, P., et al. (2019) ‘Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the International Skin Imaging Collaboration (ISIC)’, arXiv:1902.03368. https://doi.org/10.48550/arXiv.1902.03368
- Combalia, M., Codella, N.C.F., Rotemberg, V., et al. (2019) ‘BCN20000: Dermoscopic lesions in the wild’, arXiv, 1908.02288. https://doi.org/10.48550/arXiv.1908.02288
- Dawood, T., Chen, C., Sidhu, B.S., et al. (2023) ‘Uncertainty aware training to improve deep learning model calibration for classification of cardiac MR images’, Medical Image Analysis, 88, 102861. https://doi.org/10.1016/j.media.2023.102861
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021) ‘An image is worth 16×16 words: Transformers for image recognition at scale’, arXiv, 2010.11929. https://doi.org/10.48550/arXiv.2010.11929
- Esteva, A., Kuprel, B., Novoa, R.A., et al. (2017) ‘Dermatologist-level classification of skin cancer with deep neural networks’, Nature, 542(7639), pp. 115–118. https://doi.org/10.1038/nature21056
- Gessert, N., Sentker, T., Madesta, F., et al. (2020) ‘Skin lesion classification using CNNs with patch-based attention and diagnosis-guided loss weighting’, IEEE Transactions on Biomedical Engineering, 67(2), pp. 495–503. https://doi.org/10.1109/tbme.2019.2915839
- Gu, J., Wang, Z., Kuen, J., et al. (2018) ‘Recent advances in convolutional neural networks’, Pattern Recognition, 77, pp. 354–377. https://doi.org/10.1016/j.patcog.2017.10.013
- Guo, C., Pleiss, G., Sun, Y., et al. (2017) ‘On calibration of modern neural networks’, Proc. Int. Conf. Machine Learning (ICML), pp. 1321–1330.
- He, K., Zhang, X., Ren, S., et al. (2016) ‘Deep residual learning for image recognition’, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 770–778. https://doi.org/10.1109/CVPR.2016.90 .
- He, X., Tan, E.-L., Bi, H., et al. (2022) ‘Fully transformer network for skin lesion analysis’, Medical Image Analysis, 77, 102357. https://doi.org/10.1016/j.media.2022.102357
- Howard, A., Sandler, M., Chen, B., et al. (2019) ‘Searching for MobileNetV3’, Proc. IEEE Int. Conf. Computer Vision (ICCV), pp. 1314–1324. https://doi.org/10.1109/ICCV.2019.00140
- Howard, A.G., Zhu, M., Chen, B., et al. (2017) ‘MobileNets: Efficient convolutional neural networks for mobile vision applications’, arXiv, 1704.04861. https://doi.org/10.48550/arXiv.1704.04861
- Huang, G., Liu, Z., van der Maaten, L., et al. (2017) ‘Densely connected convolutional networks’, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243
- ISIC (n.d.) ISIC Challenge Datasets. International Skin Imaging Collaboration Archive. Available at: https://challenge.isic-archive.com/data/ (Accessed: Jan. 2026).
- Kassem, M.A., Hosny, K.M. and Fouad, M.M. (2020) ‘Skin lesions classification into eight classes for ISIC 2019 using deep convolutional neural network and transfer learning’, IEEE Access, 8, pp. 114822–114832. https://doi.org/10.1109/ACCESS.2020.3003890
- Khan, S., Naseer, M., Hayat, M., et al. (2022) ‘Transformers in vision: A survey,” ACM Computing Survey’, 54(10s), 200. https://doi.org/10.1145/3505244
- Kleppe, A., Skrede, O.-J., De Raedt, S., et al. (2021) ‘Designing deep learning studies in cancer diagnostics’, Nature Reviews Cancer, 21(3), pp. 199–211. https://doi.org/10.1038/s41568-020-00327-9
- Lin, T.-Y., Goyal, P., Girshick, R., et al. (2017) ‘Focal loss for dense object detection’, Proc. IEEE Int. Conf. Computer Vision (ICCV), pp. 2999–3007. doi: 10.1109/ICCV.2017.324.
- Liu, Y., Sangineto, E., Bi, W., et al. (2021) ‘Efficient training of vision transformers with small datasets’, Proc. Advances in Neural Information Processing Systems (NeurIPS), pp. 23818–23830.
- Liu, Z., Mao, H., Wu, C.-Y., et al. (2022) ‘A ConvNet for the 2020s’, Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 11976–11986. doi: 10.1109/CVPR52688.2022.01167.
- Mehta, S. and Rastegari, M. (2022) ‘MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer’, Proc. Int. Conf. Learning Representations (ICLR).
- Nie, Y., Sommella, P., Caratù, M., et al. (2022) ‘A deep CNN transformer hybrid model for skin lesion classification of dermoscopic images using focal loss’, Diagnostics, 13(1), 72. https://doi.org/10.3390/diagnostics13010072
- Nixon, J., Dusenberry, M.W., Jerfel, G., et al. (2019) ‘Measuring calibration in deep learning’, Proc. CVPR Workshops, pp. 38–41.
- Shamshad, F., Khan, S., Zamir, S.W., et al. (2023) ‘Transformers in medical imaging: A survey’, Medical Image Analysis, 88, 102802. https://doi.org/10.1016/j.media.2023.102802
- Siegel, R.L., Miller, K.D. and Jemal, A. (2020) ‘Cancer statistics, 2020’, CA: A Cancer Journal for Clinicians, 70(1), pp. 7–30. https://doi.org/10.3322/caac.21590
- Szegedy, C., Vanhoucke, V., Ioffe, S., et al. (2016) ‘Rethinking the Inception architecture for computer vision’, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826. doi: 10.1109/CVPR.2016.308.
- Tan, M. and Le, Q.V. (2019) ‘EfficientNet: Rethinking model scaling for convolutional neural networks’, Proc. Int. Conf. Machine Learning (ICML), pp. 6105–6114.
- Topol, E.J. (2019) ‘High-performance medicine: The convergence of human and artificial intelligence’, Nature Medicine, 25(1), pp. 44–56. doi: 10.1038/s41591-018-0300-7.
- Tschandl, P., Rinner, C., Apalla, Z., et al. (2020) ‘Human–computer collaboration for skin cancer recognition’, Nature Medicine, 26(8), pp. 1229–1234. https://doi.org/10.1038/s41591-020-0942-0
- Tschandl, P., Rosendahl, C. and Kittler, H. (2018) ‘The HAM10000 dataset: A large collection of multi-source dermatoscopic images of common pigmented skin lesions’, Scientific Data, 5, 180161. https://doi.org/10.1038/sdata.2018.161
- Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., et al. (2021) ‘Medical transformer: Gated axial-attention for medical image segmentation’, Proc. Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 36–46. https://doi.org/10.1007/978-3-030-87193-2_4
- Winkler, J.K., Fink, C., Toberer, F., et al. (2019) ‘Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network’, JAMA Dermatology, 155(10), pp. 1135–1141. doi: 10.1001/jamadermatol.2019.1735.
- Yao, P., Shen, S., Xu, M., et al. (2021) ‘Single model deep learning on imbalanced small datasets for skin lesion classification’, IEEE Transactions on Medical Imaging, 41(5), pp. 1242–1254. https://doi.org/10.1109/TMI.2021.3136682
- Yuan, L., Chen, Y., Wang, T., et al. (2021) ‘Tokens-to-token ViT: Training vision transformers from scratch on ImageNet’, Proc. IEEE Int. Conf. Computer Vision (ICCV), pp. 558–567. https://doi.org/10.1109/iccv48922.2021.00060
