Spearman's Rank Correlation

Last Updated : 6 Feb, 2026

Spearman’s Rank Correlation is a statistical measure used to find the strength and direction of association between two ranked variables. It checks how well the relationship between two variables can be described using a monotonic function.

  • Works on ranks instead of actual values
  • Used when data is not normally distributed
  • Suitable for ordinal data
  • Measures monotonic relationships, not just linear ones

Mathematical Intuition

Spearman’s Rank Correlation, denoted by ρ (rho), measures the strength and direction of association between two variables based on their ranks instead of actual values. Values range from -1 to +1

  • +1: perfect positive monotonic relationship i.e if one variable increases as the other increases
  • -1: perfect negative monotonic relationship i.e if one increases while the other decreases
  • 0: no monotonic relationship

Monotonic Relationship means a relationship where variables move in one direction only.

Spearman's Correlation formula

\rho = 1 - \frac{6\sum d ^{2}}{n(n^2-1)}

where:

  • d: difference between ranks of corresponding values
  • n: total number of observation

When to Use Spearman’s Rank Correlation

  • Works with non-linear data
  • Data follows a monotonic trend
  • Variables contain ranks or scores
  • Does not assume normal distribution
  • Handles ordinal data
  • Dataset contains outliers

Calculating Spearman’s Rank Correlation

Step 1: Original Data

NumberX1Y1
175
264
345
456
5810
677
7109
832
998
1021

Step 2: Convert Data into Ranks

Ranks are assigned by sorting values in ascending order. If values are tied, the average rank is assigned.

Ranking X1:

  • Sorted X1 values: 2, 3, 4, 5, 6, 7, 7, 8, 9, 10
  • The two values 7 and 7 are tied → both get rank 6.5

Ranking Y1:

  • Ranks are assigned using the same method.
  • Sorted Y1 values: 1, 2, 4, 5, 5, 6, 7, 8, 9, 10
  • The two values 5 and 5 are tied i.e both receive the average rank 4.5

Step 3: Rank Table with Differences

No.Rank X1Rank Y1d
16.54.524
25324
334.5-1.52.25
446-24
5810-24
66.57-0.50.25
710911
82200
99811
101100

\sum d^2 = 20.5

Step 4: Apply the Formula

\rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}

\rho = 1 - \frac{6 \times 20.5}{10(10^2 - 1)}

\rho = 1 - 0.12424

\rho \approx 0.88

This indicates a strong positive monotonic relationship

Calculating Spearman’s Rank Correlation in Python

Step 1: Sample Data

Python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = {
    'Hours_Studied': [2, 4, 6, 8, 10],
    'Marks': [50, 55, 65, 70, 90]
}

df = pd.DataFrame(data)
df

Output:

Screenshot-2026-02-05-160931
Dataset

Step 2: Apply Spearman’s Correlation

Python
spearman_corr = df['Hours_Studied'].corr(df['Marks'], method='spearman')
print("Spearman Correlation:", spearman_corr)

Output:

Spearman Correlation: 0.9999999999999999

Data shows a perfect increasing rank order. Hence, correlation value is 0.9

Step 3: Visualizing Spearman Correlation

  • Heatmap displays correlation between only the selected columns
  • Helps visually confirm strength of relationship
Python
corr_data = df[['Hours_Studied', 'Marks']].corr(method='spearman')

sns.heatmap(corr_data, annot=True, cmap='coolwarm')
plt.title("Spearman Rank Correlation Heatmap")
plt.show()

Output:

download-
Heatmap

Step 4: Scatter Plot for Monotonic Relationship

Python
plt.scatter(df['Hours_Studied'], df['Marks'])
plt.xlabel("Hours Studied")
plt.ylabel("Marks")
plt.title("Monotonic Relationship")
plt.show()

Output:

file
Scatter Plot

Scatter plot shows consistent upward trend hence confirming monotonic relationship.

Difference Between Pearson and Spearman Correlation

Lets see difference between Pearson Correlation and Spearman Correlation:

AspectPearson CorrelationSpearman Correlation
Data typeContinuousOrdinal or continuous
Uses actual valuesYesNo
Uses ranksNoYes
Assumes normal distributionYesNo
Relationship typeLinearMonotonic

Advantages

  • Works with non-linear monotonic data
  • Less affected by outliers
  • Simple to compute and interpret
  • No strict distribution assumptions

Limitations

  • Ignores actual data values
  • Cannot detect non-monotonic patterns
  • Less informative for precise numerical relationships
  • Less accurate for very small datasets

Applications

  • Ranking students based on marks
  • Comparing search result rankings
  • Measuring customer satisfaction levels
  • Behavioral and social science studies
  • Survey data analysis
  • Feature ranking in ML
  • Medical and social science studies
Comment

Explore