Member-only story

XGBoost or Logistic Regression model for Diabetes Prediction

5 min readJul 1, 2022

Recently, some people asked me interesting questions about whether XGBoost performs better than Logistic Regression for classification problems in machine learning. It is a philosophical question and depends on the situation of data and the scenario to adopt the correct type of algorithm. More or less, it is comparing Apple & Orange. Both of them are fruit, good for health. And obviously, they are completely different types of tastes. I leveraged the classic sample dataset “Pima Indians Diabetes” from UCI to perform diabetes classification/prediction by Logistic Regression & Gradient Boosting algorithms to demonstrate the difference.

XGBoost (eXtreme Gradient Boosting) is a machine learning library that implements an optimized and distributed Gradient Boosting machine learning algorithm. Fundamentally, Gradient Boosting is an induction-based approach for classification and prediction. In contrast, Logistic Regression is a linearly (or curvy linearly) analysis approach that uses a generalized linear equation to describe the directed dependencies among a set of variables.

Flow chart of XGBoost from Researchgate.net
Flow chart of Logistic Regression from Researchgate.net

Both algorithms have their own characteristics and strength area. I’m trying to summarize the below table.

XGBoost or Logistic Regression model for Diabetes Prediction

Written by Eason

No responses yet