Evaluation of a Lightweight CNN–Transformer Hybrid Model for Multi-Class Skin Lesion Classification under Data Distribution Shift

Dao Ngoc Ton

Lecturer, Thai Nguyen University of Technology (TNUT), Thai Nguyen University, Thai Nguyen, Vietnam. 

Abstract

Skin cancer, particularly melanoma, is among the malignancies for which early detection critically improves treatment outcomes. Although deep learning models have been widely applied to dermoscopic skin lesion classification, they still struggle under class imbalance, domain shift across data sources, and the need for reliable predictive probabilities. This paper evaluates a lightweight CNN–Transformer hybrid in which an EfficientNet-B0 backbone extracts local features and two Transformer encoder blocks model long-range contextual relations among lesion regions. The main contributions are: (i) a compact architecture (~7.5 M parameters) suitable for resource-constrained deployment; (ii) a composite loss combining cross-entropy, focal loss, and an entropy-based calibration regularization term; and (iii) a cross-dataset evaluation between HAM10000 and ISIC 2019 with reported statistical significance. Under stratified five-fold cross-validation on HAM10000, the proposed model attains Accuracy 91.8 ± 0.4 %, Balanced Accuracy 89.6 ± 0.5 %, Macro-F1 0.872 ± 0.006, and ECE 2.8 %, outperforming ConvNeXt-Tiny in Balanced Accuracy (89.6 % vs. 88.1 %, p < 0.05) and ECE (2.8 % vs. 3.9 %). When evaluated out-of-domain on ISIC 2019 without additional fine-tuning, the model achieves Accuracy 83.2 %, Balanced Accuracy 77.8 %, Macro-F1 0.759, and ECE 6.7 %. The results indicate that the proposed model maintains competitive classification performance and improves probability calibration under distribution shift, while a substantial generalization gap remains to be addressed.      

Keywords: Skin lesion classification, CNN–Transformer hybrid, Calibration, Domain shift, HAM10000, ISIC 2019, Focal loss

References

  1. Argenziano, G., Soyer, H.P., Chimenti, S., et al. (2003) ‘Dermoscopy of pigmented skin lesions: Results of a consensus meeting via the internet’, Journal of the American Academy of Dermatology, 48(5), pp. 679–693. https://doi.org/10.1067/mjd.2003.281
  2. Codella, N., Rotemberg, V., Tschandl, P., et al. (2019) ‘Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the International Skin Imaging Collaboration (ISIC)’, arXiv:1902.03368. https://doi.org/10.48550/arXiv.1902.03368
  3. Combalia, M., Codella, N.C.F., Rotemberg, V., et al. (2019) ‘BCN20000: Dermoscopic lesions in the wild’, arXiv, 1908.02288. https://doi.org/10.48550/arXiv.1908.02288
  4. Dawood, T., Chen, C., Sidhu, B.S., et al. (2023) ‘Uncertainty aware training to improve deep learning model calibration for classification of cardiac MR images’, Medical Image Analysis, 88, 102861. https://doi.org/10.1016/j.media.2023.102861
  5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021) ‘An image is worth 16×16 words: Transformers for image recognition at scale’, arXiv, 2010.11929. https://doi.org/10.48550/arXiv.2010.11929
  6. Esteva, A., Kuprel, B., Novoa, R.A., et al. (2017) ‘Dermatologist-level classification of skin cancer with deep neural networks’, Nature, 542(7639), pp. 115–118. https://doi.org/10.1038/nature21056
  7. Gessert, N., Sentker, T., Madesta, F., et al. (2020) ‘Skin lesion classification using CNNs with patch-based attention and diagnosis-guided loss weighting’, IEEE Transactions on Biomedical Engineering, 67(2), pp. 495–503. https://doi.org/10.1109/tbme.2019.2915839
  8. Gu, J., Wang, Z., Kuen, J., et al. (2018) ‘Recent advances in convolutional neural networks’, Pattern Recognition, 77, pp. 354–377. https://doi.org/10.1016/j.patcog.2017.10.013
  9. Guo, C., Pleiss, G., Sun, Y., et al. (2017) ‘On calibration of modern neural networks’, Proc. Int. Conf. Machine Learning (ICML), pp. 1321–1330.
  10. He, K., Zhang, X., Ren, S., et al. (2016) ‘Deep residual learning for image recognition’, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 770–778. https://doi.org/10.1109/CVPR.2016.90 .
  11. He, X., Tan, E.-L., Bi, H., et al. (2022) ‘Fully transformer network for skin lesion analysis’, Medical Image Analysis, 77, 102357. https://doi.org/10.1016/j.media.2022.102357
  12. Howard, A., Sandler, M., Chen, B., et al. (2019) ‘Searching for MobileNetV3’, Proc. IEEE Int. Conf. Computer Vision (ICCV), pp. 1314–1324. https://doi.org/10.1109/ICCV.2019.00140
  13. Howard, A.G., Zhu, M., Chen, B., et al. (2017) ‘MobileNets: Efficient convolutional neural networks for mobile vision applications’, arXiv, 1704.04861. https://doi.org/10.48550/arXiv.1704.04861
  14. Huang, G., Liu, Z., van der Maaten, L., et al. (2017) ‘Densely connected convolutional networks’, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243
  15. ISIC (n.d.) ISIC Challenge Datasets. International Skin Imaging Collaboration Archive. Available at: https://challenge.isic-archive.com/data/ (Accessed: Jan. 2026).
  16. Kassem, M.A., Hosny, K.M. and Fouad, M.M. (2020) ‘Skin lesions classification into eight classes for ISIC 2019 using deep convolutional neural network and transfer learning’, IEEE Access, 8, pp. 114822–114832. https://doi.org/10.1109/ACCESS.2020.3003890
  17. Khan, S., Naseer, M., Hayat, M., et al. (2022) ‘Transformers in vision: A survey,” ACM Computing Survey’, 54(10s), 200. https://doi.org/10.1145/3505244
  18. Kleppe, A., Skrede, O.-J., De Raedt, S., et al. (2021) ‘Designing deep learning studies in cancer diagnostics’, Nature Reviews Cancer, 21(3), pp. 199–211. https://doi.org/10.1038/s41568-020-00327-9
  19. Lin, T.-Y., Goyal, P., Girshick, R., et al. (2017) ‘Focal loss for dense object detection’, Proc. IEEE Int. Conf. Computer Vision (ICCV), pp. 2999–3007. doi: 10.1109/ICCV.2017.324.
  20. Liu, Y., Sangineto, E., Bi, W., et al. (2021) ‘Efficient training of vision transformers with small datasets’, Proc. Advances in Neural Information Processing Systems (NeurIPS), pp. 23818–23830.
  21. Liu, Z., Mao, H., Wu, C.-Y., et al. (2022) ‘A ConvNet for the 2020s’, Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 11976–11986. doi: 10.1109/CVPR52688.2022.01167.
  22. Mehta, S. and Rastegari, M. (2022) ‘MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer’, Proc. Int. Conf. Learning Representations (ICLR).
  23. Nie, Y., Sommella, P., Caratù, M., et al. (2022) ‘A deep CNN transformer hybrid model for skin lesion classification of dermoscopic images using focal loss’, Diagnostics, 13(1), 72. https://doi.org/10.3390/diagnostics13010072
  24. Nixon, J., Dusenberry, M.W., Jerfel, G., et al. (2019) ‘Measuring calibration in deep learning’, Proc. CVPR Workshops, pp. 38–41.
  25. Shamshad, F., Khan, S., Zamir, S.W., et al. (2023) ‘Transformers in medical imaging: A survey’, Medical Image Analysis, 88, 102802. https://doi.org/10.1016/j.media.2023.102802
  26. Siegel, R.L., Miller, K.D. and Jemal, A. (2020) ‘Cancer statistics, 2020’, CA: A Cancer Journal for Clinicians, 70(1), pp. 7–30. https://doi.org/10.3322/caac.21590
  27. Szegedy, C., Vanhoucke, V., Ioffe, S., et al. (2016) ‘Rethinking the Inception architecture for computer vision’, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826. doi: 10.1109/CVPR.2016.308.
  28. Tan, M. and Le, Q.V. (2019) ‘EfficientNet: Rethinking model scaling for convolutional neural networks’, Proc. Int. Conf. Machine Learning (ICML), pp. 6105–6114.
  29. Topol, E.J. (2019) ‘High-performance medicine: The convergence of human and artificial intelligence’, Nature Medicine, 25(1), pp. 44–56. doi: 10.1038/s41591-018-0300-7.
  30. Tschandl, P., Rinner, C., Apalla, Z., et al. (2020) ‘Human–computer collaboration for skin cancer recognition’, Nature Medicine, 26(8), pp. 1229–1234. https://doi.org/10.1038/s41591-020-0942-0
  31. Tschandl, P., Rosendahl, C. and Kittler, H. (2018) ‘The HAM10000 dataset: A large collection of multi-source dermatoscopic images of common pigmented skin lesions’, Scientific Data, 5, 180161. https://doi.org/10.1038/sdata.2018.161
  32. Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., et al. (2021) ‘Medical transformer: Gated axial-attention for medical image segmentation’, Proc. Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 36–46. https://doi.org/10.1007/978-3-030-87193-2_4
  33. Winkler, J.K., Fink, C., Toberer, F., et al. (2019) ‘Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network’, JAMA Dermatology, 155(10), pp. 1135–1141. doi: 10.1001/jamadermatol.2019.1735.
  34. Yao, P., Shen, S., Xu, M., et al. (2021) ‘Single model deep learning on imbalanced small datasets for skin lesion classification’, IEEE Transactions on Medical Imaging, 41(5), pp. 1242–1254. https://doi.org/10.1109/TMI.2021.3136682
  35. Yuan, L., Chen, Y., Wang, T., et al. (2021) ‘Tokens-to-token ViT: Training vision transformers from scratch on ImageNet’, Proc. IEEE Int. Conf. Computer Vision (ICCV), pp. 558–567. https://doi.org/10.1109/iccv48922.2021.00060

Rajshahi Medical College and University of Rajshahi, BANGLADESH.



Royal Melbourne Institute of Technology (RMIT), Melbourne, AUSTRALIA.




Agri. Services, Islamabad Model College for Girls, and Riphah International University, PAKISTAN.




Kampala International University, UGANDA; Rivers State University, NIGERIA.


Discover more from International Journal of Technology, Health and Sustainability

Subscribe now to keep reading and get access to the full archive.

Continue reading