Ensemble Methods in Python

Last Updated : 23 Mar, 2026

Ensemble methods in Python are machine learning techniques that combine multiple models to improve overall performance and accuracy. By aggregating predictions from different algorithms, ensemble methods help reduce errors, handle variance and produce more robust models.

  • Combines multiple models to achieve better predictive performance
  • Includes techniques like bagging, boosting, and stacking
  • Commonly used in classification and regression tasks for improved accuracy

Architecture of Ensemble Models

The architecture of ensemble learning defines how multiple models are organized, trained and combined to generate a final prediction. Instead of relying on a single algorithm ensemble architecture introduces multiple learning layers that work together to improve predictive performance, stability and generalization.

1. Base Learners

Base learners form the first layer of the ensemble system. These are individual machine learning models trained on the original dataset.

  • Each base model learns patterns independently.
  • They can be homogeneous (same algorithm) or heterogeneous (different algorithms).
  • Their goal is to produce initial predictions that capture different aspects of the data.

Diversity among base learners is important because combining similar models may not significantly improve performance.

2. Meta Learner

The meta learner operates at the second level of the architecture and is responsible for combining predictions from base learners.

  • It is trained on the outputs (predictions) generated by base models.
  • It learns how to assign optimal weights or relationships among these predictions.
  • The final prediction is generated after processing the combined outputs.

Two-level structure is commonly used in stacking, while other ensemble methods like bagging and boosting modify how base learners are trained and aggregated.

Types of Ensemble Methods

Ensemble methods combine multiple models in different ways to improve predictive performance. Understanding the main types helps choose the right strategy for your specific problem and dataset.

1. Max Voting

Max voting, also known as majority voting, is a ensemble technique primarily used for classification problems. In this method, multiple models make independent predictions and the class that receives the highest number of votes is selected as the final output. It improves prediction stability by combining the strengths of different classifiers.

  • Hard Voting: Each base classifier predicts a class label and the final prediction is the class with the most votes.
  • Soft Voting: Each model predicts probabilities for each class and the final prediction is the class with the highest average probability across all models.
  • Weighted Voting: Some models may have more influence than others weights are assigned based on model performance to contribute to the final prediction.

Step By Step Implementation

Here we implement Hard voting and Soft Voting

Step 1: Load and Preprocess Data

Load the dataset and split it into features (X) and target (y). Then we perform a train-test split and scale the features for better model performance.

You can download dataset from here

Python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

df = pd.read_csv("dataset Path") 

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Step 2: Initialize Base Classifiers

Here we define the individual models that will form the ensemble Logistic Regression, Decision Tree, Random Forest and XGBoost.

Python
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier

# Initialize base classifiers
log_reg = LogisticRegression(max_iter=300, random_state=42)
dt_clf = DecisionTreeClassifier(random_state=42)
rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
xgb_clf = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42)

Step 3: Train Voting Classifier

We create a hard and soft voting classifier, train them on the training data and make predictions on the test set.

Python
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score

# Hard Voting
hard_voting = VotingClassifier(
    estimators=[('lr', log_reg), ('dt', dt_clf), ('rf', rf_clf), ('xgb', xgb_clf)],
    voting='hard'
)
hard_voting.fit(X_train, y_train)
y_pred_hard = hard_voting.predict(X_test)
print("Hard Voting Accuracy:", accuracy_score(y_test, y_pred_hard))

# Soft Voting
soft_voting = VotingClassifier(
    estimators=[('lr', log_reg), ('dt', dt_clf), ('rf', rf_clf), ('xgb', xgb_clf)],
    voting='soft'
)
soft_voting.fit(X_train, y_train)
y_pred_soft = soft_voting.predict(X_test)
print("Soft Voting Accuracy:", accuracy_score(y_test, y_pred_soft))

Output:

Hard Voting Accuracy: 1.0
Soft Voting Accuracy: 1.0

2. Averaging Method

The averaging method is an ensemble technique mainly used for regression problems. Multiple models are trained independently and their predictions are averaged to produce the final output. By combining multiple predictions, variance is reduced and the ensemble generally performs better than individual models.

  • Each regression model is trained independently on the same dataset.
  • Predictions from all models are collected for each data point.
  • The final prediction is calculated as the average of all model predictions.
  • This method reduces overfitting and variance while improving robustness.

Implementation

Here we builds an averaging ensemble regression model using the Boston Housing Dataset to improve prediction accuracy.

  • The dataset is loaded, converted to numeric format and split into training and testing data.
  • Three regression models are trained separately and each model generates predictions on the test data.
  • The final prediction is calculated by averaging the three model outputs and performance is measured using the R2 score.
Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

boston = fetch_openml(name="boston", version=1, as_frame=True)

X = boston.data.apply(pd.to_numeric)
y = pd.to_numeric(boston.target)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model1 = LinearRegression()
model2 = DecisionTreeRegressor(random_state=42)
model3 = RandomForestRegressor(n_estimators=100, random_state=42)

model1.fit(X_train, y_train)
model2.fit(X_train, y_train)
model3.fit(X_train, y_train)


pred1 = model1.predict(X_test)
pred2 = model2.predict(X_test)
pred3 = model3.predict(X_test)


y_pred = (pred1 + pred2 + pred3) / 3

r2 = r2_score(y_test, y_pred)

print("R2 Score :", r2)

Output:

R2 Score : 0.8872852109557785

3. Bagging (Bootstrap Aggregation)

Bagging improves model stability and accuracy by training multiple models on different random subsets of the dataset and aggregating their predictions. Unlike Random Forest, which randomly selects a subset of features at each split, bagging uses all features for each base model. Bagging is especially effective in reducing variance and preventing overfitting.

  • Generate multiple bootstrap samples from the training dataset (random sampling with replacement).
  • Train a base model independently on each bootstrap sample.
  • Aggregate the predictions of all base models into a single output for regression, take the average of predictions, for classification, use the majority vote.
  • The final prediction is more robust and less sensitive to noise than individual base models.

Implementation

Here we implement Bagging ensemble technique using Decision Trees on the Iris dataset for classification.

  • The Iris dataset is loaded, split into training and testing sets and used to train a BaggingClassifier with 10 DecisionTree base models.
  • The bagging model fits multiple trees on bootstrap samples of the training data and aggregates their predictions to reduce variance and improve stability.
  • Predictions are made on the test set and performance is evaluated using Accuracy Score.
Python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

bagging_model = BaggingClassifier(
    base_estimator=DecisionTreeClassifier(random_state=42),
    n_estimators=10,  
    random_state=42
)

bagging_model.fit(X_train, y_train)

pred_final = bagging_model.predict(X_test)


accuracy = accuracy_score(y_test, pred_final)
print("Accuracy (Bagging on Iris):", accuracy)

Output:

Accuracy (Bagging on Iris): 1.0

4. Boosting

Boosting is a sequential ensemble method designed to convert a set of weak learners into a strong learner. Each new model is trained to correct the errors made by its predecessor and the final prediction is formed by a weighted combination of all models. Boosting is highly effective in reducing bias and improving predictive accuracy.

Unlike bagging, boosting trains models sequentially which allows each successive model to focus more on the difficult cases that previous models mispredicted. This makes it particularly powerful for datasets where simple models underperform.

Implementation

Here we implement Gradient Boosting ensemble method for regression using a heart disease dataset.

  • A GradientBoostingRegressor model is created with 100 boosting stages
  • learning_rate=0.1 to control contribution of each tree
  • max_depth=3 to limit tree complexity then trained on the training data
  • Predictions are generated on the test set and model performance is evaluated using Mean Squared Error (MSE)
Python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor

df = pd.read_csv("Your dataset")

X = df.drop("target", axis=1)
y = df["target"]
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

boosting_model = GradientBoostingRegressor(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=3,
    random_state=42
)

boosting_model.fit(X_train, y_train)
pred_final = boosting_model.predict(X_test)
mse = mean_squared_error(y_test, pred_final)
print("Mean Squared Error (Boosting):", mse)

Output:

Mean Squared Error (Boosting): 0.07407866489977881

5. Stacking Ensemble Method

Stacking combines predictions from multiple base models to train a meta-learner, which produces the final predictions. Unlike bagging and boosting that usually use homogeneous base learners, stacking often uses heterogeneous models to capture diverse patterns in the data. It can be used for both classification and regression problems.

  • Base models are trained independently on the training dataset.
  • The predictions of these base models are stacked together to form a new feature set.
  • A meta-learner is trained on these stacked predictions to generate the final output.
  • This two-level approach allows the meta-learner to capture relationships and patterns missed by individual base models.

Implementation

Here we builds a stacking ensemble regression model using multiple base learners and a meta-learner to improve prediction accuracy.

  • The dataset is loaded, features and target are separated and the data is split into training and testing sets.
  • Three base models (Linear Regression, XGBoost Regressor and Random Forest Regressor) are defined and passed to the stacking() function, which generates new meta-features using 4-fold cross-validation.
  • A Linear Regression meta-model is trained on these stacked features and used to make final predictions on the test data.
Python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb
from vecstack import stacking
df = pd.read_csv("Your dataset")
X = df.drop("target", axis=1)
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model_1 = LinearRegression()
model_2 = xgb.XGBRegressor(eval_metric='rmse', random_state=42)
model_3 = RandomForestRegressor(n_estimators=100, random_state=42)

all_models = [model_1, model_2, model_3]

s_train, s_test = stacking(
    all_models, X_train, y_train, X_test,  
    regression=True, n_folds=4, shuffle=True, random_state=42
)

meta_model = LinearRegression()
meta_model.fit(s_train, y_train)


pred_final = meta_model.predict(s_test)

mse = mean_squared_error(y_test, pred_final)
print("Mean Squared Error (Stacking):", mse)

Output:

Mean Squared Error (Stacking): 0.020857985206334067

6. Blending Ensemble Method

Blending is similar to stacking, but instead of using the whole training dataset for base models a separate validation dataset is kept aside. Base models are trained on the training set and their predictions on the validation set are used as meta-features to train a second-level model (meta-learner). This separation helps reduce overfitting and improves generalization.

Implementation

Here we implements a Blending ensemble regression technique to improve prediction accuracy using a meta-model.

  • The dataset is split into training (70%), validation (20%) and test (10%) sets to allow base models and meta-model training separately.
  • Base models (Linear Regression, XGBoost, and Random Forest) are trained on the training set and their predictions on the validation set are used as meta-features.
  • A Linear Regression meta-model is trained on these validation predictions and then used to generate final predictions on the test set.
Python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb

df = pd.read_csv("Your dataset")
X = df.drop("target", axis=1)
y = df["target"]

X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.10, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train_full, y_train_full, test_size=0.2222, random_state=42) 

model_1 = LinearRegression()
model_2 = xgb.XGBRegressor(eval_metric='rmse', random_state=42)
model_3 = RandomForestRegressor(n_estimators=100, random_state=42)
base_models = [model_1, model_2, model_3]

val_preds = []
test_preds = []

for model in base_models:
    model.fit(X_train, y_train)
    val_preds.append(pd.DataFrame(model.predict(X_val)))
    test_preds.append(pd.DataFrame(model.predict(X_test)))

meta_X_val = pd.concat(val_preds, axis=1)
meta_X_test = pd.concat(test_preds, axis=1)

meta_model = LinearRegression()
meta_model.fit(meta_X_val, y_val)
final_pred = meta_model.predict(meta_X_test)

mse = mean_squared_error(y_test, final_pred)
print("Mean Squared Error (Blending):", mse)

Output:

Mean Squared Error (Blending): 0.027088923263424304

You can download full code from here

Comment