Poisson regression is a statistical technique used to model and analyze count data, where the outcome variable represents the number of times an event occurs in a fixed interval of time, space, or any other dimension. It is most appropriate when the values of the response variable are non-negative whole numbers (0, 1, 2, ...) and the average rate at which events occur is constant.
Poisson regression is used when you want to predict things that are counts, like "how many" or "how much," and these counts can't be negative.
- Number of customers arriving at a store
- Number of clicks on a website
- Number of defects in a batch
Key Assumptions of Poisson Regression
- The response variable is a count such as number of visits, accidents, or purchases.
- Counts follow a Poisson distribution.
- The mean and variance of the distribution are equal.
- Observations are independent of each other.
- Events occur at a constant average rate.
Mathematical Formulation of Poisson Regression
In Poisson regression, the output Y is assumed to follow a Poisson distribution:
P(Y = y) = \frac{e^{-\lambda} \lambda^{y}}{y!}
Where:
- is the count variable
- y is a particular count
- lambda is the expected rate of occurrence
- e is Euler’s number (approximately 2.718)
Instead of modeling Y directly, we model the log of the expected value:
Or, in exponential form:
Where:
\lambda : The expected countX_i : Independent variables-
\beta_i : Coefficients to be learned
When to Use Poisson Regression
Poisson Regression is appropriate when:
- The dependent variable is a count (e.g., 0, 1, 2, …)
- Counts are not negative.
- The counts follow a Poisson distribution (i.e., mean ≈ variance).
- The observations are independent.
Implementation of Poisson Regression in Python
Step 1: Import Required Libraries
We start by importing NumPy for data, Statsmodels for building the Poisson regression model, and Matplotlib for plotting.
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
Step 2: Create Sample Data
We generate input values (x) and simulate corresponding count data (y) that follow a Poisson distribution, where counts increase with x.
np.random.seed(42)
x = np.linspace(0, 10, 100)
X = sm.add_constant(x)
lambda_ = np.exp(0.5 + 0.3 * x)
y = np.random.poisson(lambda_)
Step 3: Fit the Poisson Regression Model
We use the GLM function with a Poisson family to build and fit the model.
model = sm.GLM(y, X, family=sm.families.Poisson())
results = model.fit()
Step 4: View Model Summary
This gives us details about the model, including the learned coefficients and model performance.
print(results.summary())
Output:

Step 5: Predict and Plot the Results
We use the model to predict counts and then plot the actual data vs. the fitted curve.
y_pred = results.predict(X)
plt.scatter(x, y, color='orange', label='Observed')
plt.plot(x, y_pred, color='red', label='Poisson Fit')
plt.xlabel('x')
plt.ylabel('Count (y)')
plt.title('Poisson Regression')
plt.legend()
plt.show()
Output:

Poisson Regression vs Linear Regression
Feature | Linear Regression | Poisson Regression |
|---|---|---|
Output | Continuous values | Non-negative counts |
Assumption | Normal distribution | Poisson distribution |
Link Function | Identity | Log |
Use Cases | Sales, prices, temperature | Count events, incidents |
Real-World Applications of Poisson Regression
Poisson regression is widely used in domains where the outcome is a count of events over time, space, or groups. Below are some practical use cases:
- Healthcare: Estimating the number of patient admissions to a hospital per day or the number of new disease cases reported in a region each month.
- Transportation: Predicting the number of traffic accidents at a particular intersection per week.
- Customer Support: Analyzing the number of customer service calls received by a company daily.
- E-commerce and Marketing: Modeling the number of clicks on a digital advertisement or the number of purchases made by a user in a given time.
- Sports Analytics: Forecasting the number of goals scored by a team in a match or a player’s number of successful passes.
- Website Analytics: Measuring the number of page visits or downloads occurring on a website per hour.
- Insurance: Estimating the number of claims filed in a given policy period based on customer characteristics.
Related Articles