Member-only story
Recently, some people asked me interesting questions about whether XGBoost performs better than Logistic Regression for classification problems in machine learning. It is a philosophical question and depends on the situation of data and the scenario to adopt the correct type of algorithm. More or less, it is comparing Apple & Orange. Both of them are fruit, good for health. And obviously, they are completely different types of tastes. I leveraged the classic sample dataset “Pima Indians Diabetes” from UCI to perform diabetes classification/prediction by Logistic Regression & Gradient Boosting algorithms to demonstrate the difference.
XGBoost (eXtreme Gradient Boosting) is a machine learning library that implements an optimized and distributed Gradient Boosting machine learning algorithm. Fundamentally, Gradient Boosting is an induction-based approach for classification and prediction. In contrast, Logistic Regression is a linearly (or curvy linearly) analysis approach that uses a generalized linear equation to describe the directed dependencies among a set of variables.
Flow chart of Logistic Regression from Researchgate.net
Both algorithms have their own characteristics and strength area. I’m trying to summarize the below table.