Credit Scoring Using Logistic Regression and Decision Trees: A Comparative Machine Learning Using MATLAB Implementation

Shanu, Rilwan Olaniyi; Zubair, Adam Folohunso; Rahman, Mukaila Alade; Adenibuyan, Micheal Tokunbo

International Journal of Technology, Health and Sustainability,

Volume 2, Issue 2, April-June 2026

Credit Scoring Using Logistic Regression and Decision Trees: A Comparative Machine Learning Using MATLAB Implementation

Rilwan Olaniyi Shanu¹, Adam Folohunso Zubair², Mukaila Alade Rahman³, Micheal Tokunbo Adenibuyan⁴

^1,2 Lecturer, Department of Computer Science, Faculty of Computing and Information Technology, Lagos State University Ojo, Lagos Nigeria.

³ Professor, Department of Computer Science, Faculty of Computing and Information Technology, Lagos State University Ojo, Lagos Nigeria.

⁴ Lecturer, Department of Computer Science, College of Computing, Bells University of Technology, Ota, Ogun State Nigeria.

Abstract

One of the most difficult responsibilities in today’s banking and financial services is the credit risk evaluation. Making accurate predictions of the probability of default (PD) of loan applicants is vital for financial organizations in order to manage risk exposure, optimize lending decisions and remain within regulatory compliance. The aim of this study is to apply holistic machine learning methodology in the credit scoring domain. The workflow adopts a pipeline of nine steps consisting of two classification models constructed, trained, tested and compared using the MATLAB’s Risk Management Toolbox. The base model is Logistic Regression and the challenger model is a Decision Tree. Both models are trained on the data set that has 1,200 observations of customers with nine predictor variables and one binary response variable. During the preprocessing stage data binning is performed using Monotone Adjacent Pooling Algorithm (MAPA) and both weight-based and impurity-based approaches is used to study relevance of predictor variables. Model performance is evaluated on three verified metrics; Accuracy Ratio (AR), Area Under Receiver Operating Characteristic Curve (AUROC) and Kolmogorov-Smirnov (KS) Statistic. The results indicate that the Decision Tree model performs better than the Logistic Regression model on the three measures with default binning (AR=0.389, AUROC=0.695 and KS=0.297 vs AR=0.325, AUROC=0.663, and KS=0.232). However, changing the binning settings to the Split criterion using Gini index, the Logistic Regression model gives better results (AUROC = 0.71). This work further addresses predictor importance and hyperparameter tuning and others. The results support the notion that the choice of credit scoring model depends on the dataset and setup, also the stringent validation protocols needed for ethical AI applications in finance.

Keywords: Credit scoring, Logistic regression, Decision tree, Probability of default, Machine learning, MATLAB implementation, Credit scorecard, Financial risk, Model validation

Download PDF

References

Anderson, R. (2007) The credit scoring toolkit: Theory and practice for retail credit risk management and decision automation. Oxford University Press.
EBA (2023) Supervisory handbook on the validation of IRB rating systems eba/rep/2023/29. European Banking Authority.
Baesens, B., Gestel, T.V, Viaene, S., et al. (2003) ‘Benchmarking state-of-the-art classification algorithms for credit scoring’, Journal of The Operational Research Society, 54(6), pp. 627-635.
Breiman, L., Friedman, J., Olshen, R.A., et al. (2017) Classification and regression trees. Chapman and Hall/CRC.
Davis, J. and Goadrich, M. (2006) ‘The relationship between precision-recall and ROC curves’, Proceedings of the 23^rd International Conference on Machine Learning. https://ftp.cs.wisc.edu/machine-learning/shavlik-group/davis.icml06.pdf
Elith, J., Leathwick, J.R. and Hastie,T. (2008) ‘A working guide to boosted regression trees’, Journal of Animal Ecology, 77(4), pp. 802-813.
Engelmann, B., Hayden, E. and Tasche, D. (2003) ‘Testing rating accuracy’, Risk, 16(1), pp. 82-86.
BIS (2006) Basel II: International convergence of capital measurement and capital standards. Bank for International Settlements. https://www.bis.org/publ/bcbs128.htm
Deswal, S. and Pal, M. (2025) ‘Uncertainty estimation in predicting oxygenation by plunging jet aerators using probabilistic machine learning and conformal prediction’, International Journal of Technology, Health and Sustainability, 1(2), pp. 83-93. https://ijths.com/wp-content/uploads/2025/12/IJTHS-010230.pdf
Deswal, S., Pal, M., Bhardwaj, P., et al. (2026) ‘Traffic Noise Modelling using Integrated Conformal Prediction Based Uncertainty Estimation with Machine Learning Algorithms’, International Journal of Technology, Health and Sustainability, 2(2), pp. 465-485. https://ijths.com/wp-content/uploads/IJTHS-0202005.pdf
Hamon, R., Junklewitz, H., Sanchez, I., et al. (2022) ‘Bridging the gap between AI and explainability in the GDPR: Towards trustworthiness-by-design in automated decision-making’, IEEE Computational Intelligence Magazine, 17(1), pp. 72-85.
Hand, D.J. (2005) ‘Good practice in retail credit scorecard assessment’, Journal of the Operational Research Society, 56(9), pp. 1109-1117.
Hand, D.J. and Henley, W.E. (1997) ‘Statistical classification methods in consumer credit scoring: a review’, Journal of the Royal Statistical Society: Series A (Statistics in Society)’, 160(3), pp. 523-541.
CH (2008) Country risk management. Comptroller’s Handbook. Washington DC: Office of the Comptroller of the Currency.
Hastie, T., Tibshirani, R. and Friedman, J. (2008) Model inference and averaging. The elements of statistical learning: Data mining, inference, and prediction. Springer.
Huang, C.-L., Chen, M.-C. and Wang, C.-J. (2007) ‘Credit scoring with a data mining approach based on support vector machines’, Expert Systems with Applications, 33(4), pp. 847-856.
James, G., Witten, D., Hastie, T., et al. (2013) An introduction to statistical learning: with applications in R. Springer.
Khandani, A. E., Kim, A.J. and Lo, A.W. (2010) ‘Consumer credit-risk models via machine-learning algorithms’, Journal of Banking and Finance, 34(11), pp. 2767-2787.
Khashman, A. (2010) ‘Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes’, Expert Systems with Applications, 37(9), pp. 6233-6239.
Lessmann, S., Baesens, B., Seow, H.-V., et al. (2015) ‘Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research’, European Journal of Operational Research, 247(1), pp. 124-136.
Mays, F.E. (2004) Credit scoring for risk managers: The handbook for lenders. Thomson/South-Western.
OCC (2011) Supervisory guidance on model risk management. Office of the Comptroller of the Currency (OCC) Bulletin 2011-12. https://www.federalreserve.gov/frrs/guidance/supervisory-guidance-on-model-risk-management.htm
Page, M.J., McKenzie, J.E., Bossuyt, P.M., et al. (2021) ‘The PRISMA 2020 Statement: An updated guideline for reporting systematic reviews’, BMJ, 372, 71. https://doi.org/10.1136/bmj.n71
Quinlan, J.R. (1986) ‘Induction of decision trees’, Machine Learning, 1(1), pp. 81-106.
Quinlan, J.R. (2014) C4. 5: programs for machine learning. Elsevier.
Rindskopf, D. (2023) Generalized linear models. In: APA handbook of research methods in psychology: Data analysis and research publication; Cooper, H., Coutanche, M.N., McMullen, L.M., et al. 2^nd ed. American Psychological Association.
Siddiqi, N. (2006) Credit risk scorecards: developing and implementing intelligent credit scoring. Hoboken, NJ: John Wiley and Sons.
Siddiqi, N. (2012) Credit risk scorecards: developing and implementing intelligent credit scoring. Hoboken, NJ: John Wiley and Sons.
Škorjanc, Ž. (2025) ‘The right to explanation of a credit score: A holistic approach under the GDPR, AI Act, and Directive (EU) 2023/2225 on credit agreements for consumers’, Global Privacy Law Review, 6(3), pp. 91-106.
Smirnov, N. (1948) ‘Table for estimating the goodness of fit of empirical distributions’, The annals of Mathematical Statistics, 19(2), pp. 279-281.
Sobehart, J.R., Keenan, S.C. and Stein, R. (2000) ‘Benchmarking quantitative default risk models: A validation methodology’, Moody’s Investors Service, 4(6), pp. 57-72.
Stephens, M.A. (1992) Introduction to Kolmogorov (1933) on the empirical determination of a distribution. In: Breakthroughs in statistics; Kortz, S and Johnson, N.L. NY: Springer.
Thomas, L., Crook, J. and Edelman, D. (2017) Credit scoring and its applications. SIAM.
Udu, C.E. and Okpala, C.C. (2026) ‘Artificial Intelligence-Enabled Resilient Scheduling: A Systematic Review and Research Roadmap for Digital Twin and Machine Learning in Disruption-Aware Operations’, International Journal of Technology, Health and Sustainability, 2(2), pp. 486-497. https://ijths.com/wp-content/uploads/IJTHS-0202014.pdf
West, D. (2000) ‘Neural network credit scoring models’, Computers and Operations Research, 27(11-12), pp. 1131-1152.
Yeh, I.-C. and Lien, C.-h. (2009) ‘The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients’, Expert Systems with Applications, 36(2), pp. 2473-2480.

Spotlight Articles

Article 6 of the Paris Agreement: A Comprehensive Review of Mechanisms, Progress, and Persistent Challenges

Drug and Device World, INDIA.

Effect of Losartan in Reducing Microalbuminuria on Diabetic Patient

Rajshahi Medical College and University of Rajshahi, BANGLADESH.

Uncertainty Estimation in Predicting Oxygenation by Plunging Jet Aerators Using Probabilistic Machine Learning and Conformal Prediction

National Institute of Technology Kurukshetra, Kurukshetra, INDIA.

A Futuristic Perspective of the Usage of AI: Growth, Merits and Limitations

Royal Melbourne Institute of Technology (RMIT), Melbourne, AUSTRALIA.

Technical Design and Sustainability Analysis of Rooftop On-Grid Photovoltaic System for Communal Religious Buildings

Widya Mandala Catholic University Surabaya, INDONESIA.

A Critical Growth Analysis of Industrial and Professional Services Robots Installed and in Operation Worldwide

Wychar Labs, Dallas, Texas, UNITED STATES OF AMERICA (USA).

Impact of Balanced Use of Fertilizer on Crop Production and Its Quality for Human Health

Agri. Services, Islamabad Model College for Girls, and Riphah International University, PAKISTAN.

Modeling Groundwater Velocity Response to Permeability and Storage Coefficient Variations in Confined Gravel Aquifers of Nigeria

Rivers State University, NIGERIA.

Sustainable Development and the Role of the Indian Judiciary in Promoting It with Special Reference to the Precautionary Principle and the Polluter Pays Principle

Durham University, Durham, UNITED KINGDOM (UK).

Analyzing the Spread and Impact of 5G-Corona Misinformation on Twitter: A Statistical Approach

Kampala International University, UGANDA; Rivers State University, NIGERIA.

Latest Articles

See more from the latest issue

Credit Scoring Using Logistic Regression and Decision Trees: A Comparative Machine Learning Using MATLAB Implementation

Abstract

Keywords: Credit scoring, Logistic regression, Decision tree, Probability of default, Machine learning, MATLAB implementation, Credit scorecard, Financial risk, Model validation

References

Like this:

Spotlight Articles

Latest Articles

Excessive Screen Time and Language Development Among School-Aged Children in Emerging Digital Societies

Understanding Circular Economy Practices in Agro-Food Supply Chains: Insights from Agro-Entrepreneurs in Sub-Saharan Africa

Role of Civil Society in Public Service Delivery in Bangladesh: Opportunities, Constraints, and Policy Implications