Adaboost from scratch in Python

1 minute read

The learning framework of the Adaboost algorithm is:

  • Data: The data chosen is from SKLEARN.DATASETS and belongs to the Classification task.

  • Algorithm: Is the Adaboost, that is an ensemble algorithm.

  • Hypothesis: The hypothesis function assumes that the previous prediction of the classifier has an impact in the new prediction.

  • Loss: The loss function computes the errors using the exponential.

Introduction

The data selected to learn and predict is the breast cancer data set from the UCI Machine Learning repository that has been loaded into the SKLEARN package. This is a multivariate data set that has the diagnosis of cancer for 539 samples based on 30 features extracted from digitized images of a fine needle aspirate (FNA) of their breast mass. The target is to predict the diagnosis which could be benign or malign, and it has a non-linear distribution.

The Adaboost technique has been chosen because it was designed to tackle non-linear distributions and can be used on linear as well. Adaboost is an ensemble method. It uses basic decision trees that behave a little bit better than flipping a coin, they are called weak learners; thus, each prediction is adjusted iteratively through the significance of the samples and the influence of each weak learner and can come up with better predictions. In this case, the data set will be resampled according to the weight assigned to misclassified samples in a loop until all the weak learners specified have learnt.

This report presents the implementation of the Adaboost algorithm for binary classification, where the target has a non-linear distribution. This algorithm is simple to understand and implement, and has a powerful predictivity. Lately, there are variations on the boosting technique, such as XGBoost, which improves its computing performance. XGBoost has been used in some competitions winning ample by performance and accuracy.

Python code

The implementation can be found in Google Colab: Adaboost from scratch in Python.

Also is a brief explanation about some key features of this implementation. Here is the YouTube video: 5 min preso

Documentation

The theoretical details are in this document

The 5 min presentation is here