Evaluation of a Lightweight CNN–Transformer Hybrid Model for Multi-Class Skin Lesion Classification under Data Distribution Shift

Ton, Dao Ngoc

International Journal of Technology, Health and Sustainability,

Volume 2, Issue 2, April-June 2026

Evaluation of a Lightweight CNN–Transformer Hybrid Model for Multi-Class Skin Lesion Classification under Data Distribution Shift

Dao Ngoc Ton

Lecturer, Thai Nguyen University of Technology (TNUT), Thai Nguyen University, Thai Nguyen, Vietnam.

Abstract

Skin cancer, particularly melanoma, is among the malignancies for which early detection critically improves treatment outcomes. Although deep learning models have been widely applied to dermoscopic skin lesion classification, they still struggle under class imbalance, domain shift across data sources, and the need for reliable predictive probabilities. This paper evaluates a lightweight CNN–Transformer hybrid in which an EfficientNet-B0 backbone extracts local features and two Transformer encoder blocks model long-range contextual relations among lesion regions. The main contributions are: (i) a compact architecture (~7.5 M parameters) suitable for resource-constrained deployment; (ii) a composite loss combining cross-entropy, focal loss, and an entropy-based calibration regularization term; and (iii) a cross-dataset evaluation between HAM10000 and ISIC 2019 with reported statistical significance. Under stratified five-fold cross-validation on HAM10000, the proposed model attains Accuracy 91.8 ± 0.4 %, Balanced Accuracy 89.6 ± 0.5 %, Macro-F1 0.872 ± 0.006, and ECE 2.8 %, outperforming ConvNeXt-Tiny in Balanced Accuracy (89.6 % vs. 88.1 %, p < 0.05) and ECE (2.8 % vs. 3.9 %). When evaluated out-of-domain on ISIC 2019 without additional fine-tuning, the model achieves Accuracy 83.2 %, Balanced Accuracy 77.8 %, Macro-F1 0.759, and ECE 6.7 %. The results indicate that the proposed model maintains competitive classification performance and improves probability calibration under distribution shift, while a substantial generalization gap remains to be addressed.

Keywords: Skin lesion classification, CNN–Transformer hybrid, Calibration, Domain shift, HAM10000, ISIC 2019, Focal loss

Download PDF

References

Argenziano, G., Soyer, H.P., Chimenti, S., et al. (2003) ‘Dermoscopy of pigmented skin lesions: Results of a consensus meeting via the internet’, Journal of the American Academy of Dermatology, 48(5), pp. 679–693. https://doi.org/10.1067/mjd.2003.281
Codella, N., Rotemberg, V., Tschandl, P., et al. (2019) ‘Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the International Skin Imaging Collaboration (ISIC)’, arXiv:1902.03368. https://doi.org/10.48550/arXiv.1902.03368
Combalia, M., Codella, N.C.F., Rotemberg, V., et al. (2019) ‘BCN20000: Dermoscopic lesions in the wild’, arXiv, 1908.02288. https://doi.org/10.48550/arXiv.1908.02288
Dawood, T., Chen, C., Sidhu, B.S., et al. (2023) ‘Uncertainty aware training to improve deep learning model calibration for classification of cardiac MR images’, Medical Image Analysis, 88, 102861. https://doi.org/10.1016/j.media.2023.102861
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021) ‘An image is worth 16×16 words: Transformers for image recognition at scale’, arXiv, 2010.11929. https://doi.org/10.48550/arXiv.2010.11929
Esteva, A., Kuprel, B., Novoa, R.A., et al. (2017) ‘Dermatologist-level classification of skin cancer with deep neural networks’, Nature, 542(7639), pp. 115–118. https://doi.org/10.1038/nature21056
Gessert, N., Sentker, T., Madesta, F., et al. (2020) ‘Skin lesion classification using CNNs with patch-based attention and diagnosis-guided loss weighting’, IEEE Transactions on Biomedical Engineering, 67(2), pp. 495–503. https://doi.org/10.1109/tbme.2019.2915839
Gu, J., Wang, Z., Kuen, J., et al. (2018) ‘Recent advances in convolutional neural networks’, Pattern Recognition, 77, pp. 354–377. https://doi.org/10.1016/j.patcog.2017.10.013
Guo, C., Pleiss, G., Sun, Y., et al. (2017) ‘On calibration of modern neural networks’, Proc. Int. Conf. Machine Learning (ICML), pp. 1321–1330.
He, K., Zhang, X., Ren, S., et al. (2016) ‘Deep residual learning for image recognition’, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 770–778. https://doi.org/10.1109/CVPR.2016.90 .
He, X., Tan, E.-L., Bi, H., et al. (2022) ‘Fully transformer network for skin lesion analysis’, Medical Image Analysis, 77, 102357. https://doi.org/10.1016/j.media.2022.102357
Howard, A., Sandler, M., Chen, B., et al. (2019) ‘Searching for MobileNetV3’, Proc. IEEE Int. Conf. Computer Vision (ICCV), pp. 1314–1324. https://doi.org/10.1109/ICCV.2019.00140
Howard, A.G., Zhu, M., Chen, B., et al. (2017) ‘MobileNets: Efficient convolutional neural networks for mobile vision applications’, arXiv, 1704.04861. https://doi.org/10.48550/arXiv.1704.04861
Huang, G., Liu, Z., van der Maaten, L., et al. (2017) ‘Densely connected convolutional networks’, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243
ISIC (n.d.) ISIC Challenge Datasets. International Skin Imaging Collaboration Archive. Available at: https://challenge.isic-archive.com/data/ (Accessed: Jan. 2026).
Kassem, M.A., Hosny, K.M. and Fouad, M.M. (2020) ‘Skin lesions classification into eight classes for ISIC 2019 using deep convolutional neural network and transfer learning’, IEEE Access, 8, pp. 114822–114832. https://doi.org/10.1109/ACCESS.2020.3003890
Khan, S., Naseer, M., Hayat, M., et al. (2022) ‘Transformers in vision: A survey,” ACM Computing Survey’, 54(10s), 200. https://doi.org/10.1145/3505244
Kleppe, A., Skrede, O.-J., De Raedt, S., et al. (2021) ‘Designing deep learning studies in cancer diagnostics’, Nature Reviews Cancer, 21(3), pp. 199–211. https://doi.org/10.1038/s41568-020-00327-9
Lin, T.-Y., Goyal, P., Girshick, R., et al. (2017) ‘Focal loss for dense object detection’, Proc. IEEE Int. Conf. Computer Vision (ICCV), pp. 2999–3007. doi: 10.1109/ICCV.2017.324.
Liu, Y., Sangineto, E., Bi, W., et al. (2021) ‘Efficient training of vision transformers with small datasets’, Proc. Advances in Neural Information Processing Systems (NeurIPS), pp. 23818–23830.
Liu, Z., Mao, H., Wu, C.-Y., et al. (2022) ‘A ConvNet for the 2020s’, Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 11976–11986. doi: 10.1109/CVPR52688.2022.01167.
Mehta, S. and Rastegari, M. (2022) ‘MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer’, Proc. Int. Conf. Learning Representations (ICLR).
Nie, Y., Sommella, P., Caratù, M., et al. (2022) ‘A deep CNN transformer hybrid model for skin lesion classification of dermoscopic images using focal loss’, Diagnostics, 13(1), 72. https://doi.org/10.3390/diagnostics13010072
Nixon, J., Dusenberry, M.W., Jerfel, G., et al. (2019) ‘Measuring calibration in deep learning’, Proc. CVPR Workshops, pp. 38–41.
Shamshad, F., Khan, S., Zamir, S.W., et al. (2023) ‘Transformers in medical imaging: A survey’, Medical Image Analysis, 88, 102802. https://doi.org/10.1016/j.media.2023.102802
Siegel, R.L., Miller, K.D. and Jemal, A. (2020) ‘Cancer statistics, 2020’, CA: A Cancer Journal for Clinicians, 70(1), pp. 7–30. https://doi.org/10.3322/caac.21590
Szegedy, C., Vanhoucke, V., Ioffe, S., et al. (2016) ‘Rethinking the Inception architecture for computer vision’, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826. doi: 10.1109/CVPR.2016.308.
Tan, M. and Le, Q.V. (2019) ‘EfficientNet: Rethinking model scaling for convolutional neural networks’, Proc. Int. Conf. Machine Learning (ICML), pp. 6105–6114.
Topol, E.J. (2019) ‘High-performance medicine: The convergence of human and artificial intelligence’, Nature Medicine, 25(1), pp. 44–56. doi: 10.1038/s41591-018-0300-7.
Tschandl, P., Rinner, C., Apalla, Z., et al. (2020) ‘Human–computer collaboration for skin cancer recognition’, Nature Medicine, 26(8), pp. 1229–1234. https://doi.org/10.1038/s41591-020-0942-0
Tschandl, P., Rosendahl, C. and Kittler, H. (2018) ‘The HAM10000 dataset: A large collection of multi-source dermatoscopic images of common pigmented skin lesions’, Scientific Data, 5, 180161. https://doi.org/10.1038/sdata.2018.161
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., et al. (2021) ‘Medical transformer: Gated axial-attention for medical image segmentation’, Proc. Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 36–46. https://doi.org/10.1007/978-3-030-87193-2_4
Winkler, J.K., Fink, C., Toberer, F., et al. (2019) ‘Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network’, JAMA Dermatology, 155(10), pp. 1135–1141. doi: 10.1001/jamadermatol.2019.1735.
Yao, P., Shen, S., Xu, M., et al. (2021) ‘Single model deep learning on imbalanced small datasets for skin lesion classification’, IEEE Transactions on Medical Imaging, 41(5), pp. 1242–1254. https://doi.org/10.1109/TMI.2021.3136682
Yuan, L., Chen, Y., Wang, T., et al. (2021) ‘Tokens-to-token ViT: Training vision transformers from scratch on ImageNet’, Proc. IEEE Int. Conf. Computer Vision (ICCV), pp. 558–567. https://doi.org/10.1109/iccv48922.2021.00060

Spotlight Articles

Article 6 of the Paris Agreement: A Comprehensive Review of Mechanisms, Progress, and Persistent Challenges

Drug and Device World, INDIA.

Effect of Losartan in Reducing Microalbuminuria on Diabetic Patient

Rajshahi Medical College and University of Rajshahi, BANGLADESH.

Uncertainty Estimation in Predicting Oxygenation by Plunging Jet Aerators Using Probabilistic Machine Learning and Conformal Prediction

National Institute of Technology Kurukshetra, Kurukshetra, INDIA.

A Futuristic Perspective of the Usage of AI: Growth, Merits and Limitations

Royal Melbourne Institute of Technology (RMIT), Melbourne, AUSTRALIA.

Technical Design and Sustainability Analysis of Rooftop On-Grid Photovoltaic System for Communal Religious Buildings

Widya Mandala Catholic University Surabaya, INDONESIA.

A Critical Growth Analysis of Industrial and Professional Services Robots Installed and in Operation Worldwide

Wychar Labs, Dallas, Texas, UNITED STATES OF AMERICA (USA).

Impact of Balanced Use of Fertilizer on Crop Production and Its Quality for Human Health

Agri. Services, Islamabad Model College for Girls, and Riphah International University, PAKISTAN.

Modeling Groundwater Velocity Response to Permeability and Storage Coefficient Variations in Confined Gravel Aquifers of Nigeria

Rivers State University, NIGERIA.

Sustainable Development and the Role of the Indian Judiciary in Promoting It with Special Reference to the Precautionary Principle and the Polluter Pays Principle

Durham University, Durham, UNITED KINGDOM (UK).

Analyzing the Spread and Impact of 5G-Corona Misinformation on Twitter: A Statistical Approach

Kampala International University, UGANDA; Rivers State University, NIGERIA.

Latest Articles

See more from the latest issue