Machine Learning

ML | XGBoost (eXtreme Gradient Boosting). The Complete Machine Learning Developer Course 2023 [Videos].

XGBoost is an implementation of Gradient Boosted decision trees. This library was written in C++. It is a type of Software library that was designed basically to improve speed and model performance. It has recently been dominating in applied machine learning. XGBoost models majorly dominate in many Kaggle Competitions.

In this algorithm, decision trees are created in sequential form. Weights play an important role in XGBoost. Weights are assigned to all the independent variables which are then fed into the decision tree which predicts results. Weight of variables predicted wrong by the tree is increased and these the variables are then fed to the second decision tree. These individual classifiers/predictors then ensemble to give a strong and more precise model. It can work on regression, classification, ranking, and user-defined prediction problems.

XGBoost Features
The library is laser-focused on computational speed and model performance, as such, there are few frills.
Model Features
Three main forms of gradient boosting are supported:

  • Gradient Boosting
  • Stochastic Gradient Boosting
  • Regularized Gradient Boosting

System Features

  • For use of a range of computing environments this library provides-
  • Parallelization of tree construction
  • Distributed Computing for training very large models
  • Cache Optimization of data structures and algorithm

Steps to Install

XGBoost uses Git submodules to manage dependencies. So when you clone the repo, remember to specify –recursive option:

git clone --recursive https://github.com/dmlc/xgboost

For windows users who use github tools, you can open the git shell and type the following command:

git submodule init
git submodule update

First, obtain gcc-8 with Homebrew (https://brew.sh/) to enable multi-threading (i.e. using multiple CPU threads for training). The default Apple Clang compiler does not support OpenMP, so using the default compiler would have disabled multi-threading.

brew install gcc@8

Then install XGBoost with pip:

pip3 install xgboost

You might need to run the command with –user flag if you run into permission errors.

Code: Python code for XGB Classifier

# Write Python3 code here
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv("Churn_Modelling.csv")
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size = 0.2, random_state = 0)
# Fitting XGBoost to the training data
import xgboost as xgb
my_model = xgb.XGBClassifier()
my_model.fit(X_train, y_train)
# Predicting the Test set results
y_pred = my_model.predict(X_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)


Accuracy will be about 0.8645

See All

Comments (148 Comments)

Submit Your Comment

See All Posts

Related Posts

Machine Learning / Youtube

What is machine learning in simple words?

Learning means the acquisition of knowledge or skills through study or experience. Based on this, we can define machine learning (ML) as follows: It may be defined as the field of computer science, more specifically an application of artificial intelligence, which provides computer systems the ability to learn with data and improve from experience without being explicitly programmed. Basically, the main focus of machine learning is to allow the computers learn automatically without human intervention. Machine learning is a subfield of artificial intelligence, which is broadly defined as the capability of a machine to imitate intelligent human behavior. Artificial intelligence systems are used to perform complex tasks in a way that is similar to how humans solve problems.
27-jan-2021 /10 /148

Machine Learning / Youtube

What is sequence data in machine learning?

Sequence Modeling is the task of predicting what word/letter comes next. Unlike the FNN and CNN, in sequence modeling, the current output is dependent on the previous input and the length of the input is not fixed. In this section, we will discuss some of the practical applications of sequence modeling.
3-jan-2022 /10 /148

Machine Learning / Youtube

What is descriptive statistics in machine learning?

DESCRIPTIVE STATISTICS : Descriptive Statistics is a statistics or a measure that describes the data. INFERENTIAL STATISTICS : Using a random sample of data taken from a population to describe and make inferences about the population is called Inferential Statistics.
3-jan-2022 /10 /148