Linear Regression Formula

Last Updated : 5 Mar, 2026

Linear regression is a statistical method that is used in various machine learning models to predict the value of unknown data using other related data values. Linear regression is used to study the relationship between a dependent variable and an independent variable.

In linear regression, we assume a linear relationship between the variables, which means that changes in the independent variable are associated with proportional changes in the dependent variable.

Various linear regression that are commonly used are,

1) Simple Linear Regression: This is the simplest form, where we have one thing we're trying to predict and one thing we think might influence it. For example, we are performing a predictive analysis where we are trying to predict someone's weight based on their height.

2) Multiple Linear Regression: Here, things get a bit more complex. We're still predicting one thing, but now we're considering multiple factors that might influence it. For instance, we might predict a person's weight based on their height, age, and maybe even their diet habits.

3) Logistic Regression: This one comes into play when we're dealing with binary outcomes, like whether someone will click on an ad or not. We're still looking at multiple factors that might play a role.

4) Ordinal Regression: Sometimes, what we're trying to predict isn't exactly numerical, but it has an order. Think of rating something from 1 to 5 stars. This kind of regression helps us predict such ordinal outcomes.

5) Multinomial Regression: When our outcome has several categories but no inherent order, like predicting someone's favorite color among several options, we turn to multinomial regression.

6) Discriminant Analysis: Similar to multinomial regression, this helps us when we have multiple categories for our outcome variable, but here, we're specifically focused on classifying cases into those categories based on the predictor variables.

Linear Regression Equation

The linear regression line equation is written in the form

y = a + bx

where,

  • x is an Independent Variable, Plotted along the X-axis
  • y is the Dependent Variable, Plotted along the Y-axis

The intercept value, a, and the slope of the line, b, are evaluated using the formulas given below:

\begin{array}{l}\large a~=~\frac{\sum y \sum x^{2} ~–~ \sum x \sum xy} {n(\sum x^{2}) ~–~ (\sum x)^{2}}\end{array} \\

\begin{array}{l}\large b~=~\frac{n\sum xy~-~\left(\sum x\right)\left(\sum y\right)}{n\sum x^{2}~-~\left(\sum x\right)^{2}}\end{array}

where,

  • y is the Dependent Variable that Lies along the Y-axis
  • a is the y-intercept.
  • b is the Slope of the regression line.
  • x is an Independent Variable that Lies along the X-axis

Properties of Linear Regression

In the linear regression line, if the regression parameters a₀ and a₁ are defined, the properties are given below:

  • The linear regression line reduces the sum of squared differences between observed values and predicted values.
  • The linear regression line always passes through the mean of the X and Y variable values.
  • The linear regression constant (b₀) is equal to the y-intercept of the linear regression.
  • The linear regression coefficient (b₀) is the slope of the regression line.

Linear Regression Line

The least squares method is the most common method used to fit a regression line in the X-Y graph. In this process, we determine the line of best fit by reducing the sum of the squares of the vertical deviations from each data point to the line.

For any point that is fitted accurately, its perpendicular deviation is zero. The linear regression line is shown in the image added below.

X-and-Y-Linear-Regression

Regression Coefficient

Linear regression line, equation:

Y = B0 + B1X

where,

  • B0 is a constant.
  • B1 is the regression coefficient.

Here, B1 is the regression coefficient and its formula is,

B1 = b1 = Σ [ (xi – x)(yi – y) ] / Σ [(xi – x)2]

where,

  • xi and yi are observed data sets.
  • x and y are mean values.

Applications of Linear Regression

Various uses of Linear Regression are

  • It is used in market research and the study of customer survey results.
  • It is used for studying the performance of the engines of automobiles.
  • It is used in deciding the effective price of any goods.
  • It is used in astronomy.

Error in Linear Regression Formula

The standard error about the regression line is defined as the measure of the average proportion that the regression equation predicts. Standard error in this case is denoted by 'SE.' The higher the coefficient of the determination involved, the lower the standard error, and hence, a more accurate result is generated.

Solved Example Questions on Linear Regression

Question 1: Find the linear regression equation for the given data:

x

y

3

8

9

6

5

4

3

2

Calculating intercept and slope value.

x

y

x2

xy

3

8

9

24

9

6

81

54

5

4

25

20

3

2

9

6

∑x = 20

∑y = 20

∑x2 = 124

∑xy = 104

Using formula,

\begin{array}{l}\large a~=~\frac{\sum y \sum x^{2} ~–~ \sum x \sum xy} {n(\sum x^{2}) ~–~ (\sum x)^{2}}\end{array}\\

a = {20 (124) - 20 (104)} / {4 (124) - 400}
a = 400/96 = 4.17
\begin{array}{l}\large b~=~\frac{n\sum xy~-~\left(\sum x\right)\left(\sum y\right)}{n\sum x^{2}~-~\left(\sum x\right)^{2}}\end{array}
b = {4 (104) - 20 (20)} / {4 (124) - 400}
b = 16/96 = 0.166

So, linear regression equation is, y=a+bx → y = 4.17 + 0.167x

Question 2: Find the linear regression equation for the given data:

x

y

4

6

7

5

3

8

1

3

Calculating intercept and slope value.

x

y

x2

xy

4

6

16

24

7

5

49

35

3

8

9

24

1

3

1

3

∑x = 15

∑y = 22

∑x2 = 75

∑xy = 86

Using formula,

\begin{array}{l}\large a~=~\frac{\sum y \sum x^{2} ~–~ \sum x \sum xy} {n(\sum x^{2}) ~–~ (\sum x)^{2}}\end{array}\\
= (22 (75) - 15 (86)) / (4 (75) - 225)
= 360/75
= 4.8

\begin{array}{l}\large b~=~\frac{n\sum xy~-~\left(\sum x\right)\left(\sum y\right)}{n\sum x^{2}~-~\left(\sum x\right)^{2}}\end{array}
= (4 (86) - 15 (22)) / (4 (75) - 225)
= 14/75
= 0.1867

So, the linear regression equation is, y = 4.8 + 0.187x.

Question 3: Find the intercept of the linear regression line if ∑x = 25, ∑y = 20, ∑x2 = 90, ∑xy = 150, and n = 5.

Using formula,
\begin{array}{l}\large a=\frac{\sum y \sum x^{2} – \sum x \sum xy} {n(\sum x^{2}) – (\sum x)^{2}}\end{array}\\

= (20 (90) - 25 (150)) / (5 (90) - 625)
= -1950/-175
= 11.14

Question 4: Find the intercept of the linear regression line if ∑x = 30, ∑y = 27, ∑x2 = 110, ∑xy = 190, and n = 4.

Using formula,
\begin{array}{l}\large a=\frac{\sum y \sum x^{2} – \sum x \sum xy} {n(\sum x^{2}) – (\sum x)^{2}}\end{array}\\

= (27 (110) - 30 (190)) / (4 (110) - 900)
= -2730/-460
= 5.93

Question 5: Find the slope of the linear regression line if ∑x = 10, ∑y = 16, ∑x2 = 60, ∑xy = 120, and n = 4.

Using formula,
\begin{array}{l}\large b=\frac{n\sum xy-\left(\sum x\right)\left(\sum y\right)}{n\sum x^{2}-\left(\sum x\right)^{2}}\end{array}

= (4 (120) - 10 (16)) / (4 (60) - 100)
= 320/140
= 2.29

Question 6: Find the slope of the linear regression line if ∑x = 40, ∑y = 32, ∑x2 = 130, ∑xy = 210, and n = 4.

Using formula,

\begin{array}{l}\large b=\frac{n\sum xy-\left(\sum x\right)\left(\sum y\right)}{n\sum x^{2}-\left(\sum x\right)^{2}}\end{array}

= (4 (210) - 40 (32)) / (4 (130) - 1600)
= -440/-1080
= 0.407

Question 7: Find the slope of the linear regression line if ∑x = 50, ∑y = 44, ∑x2 = 150, ∑xy = 230, and n = 4.

Using formula,

\begin{array}{l}\large a=\frac{\sum y \sum x^{2} – \sum x \sum xy} {n(\sum x^{2}) – (\sum x)^{2}}\end{array}\\

= (44 (150) - 50 (230)) / (4 (150) - 2500)
= -4900/-1900
= 2.57

\begin{array}{l}\large b=\frac{n\sum xy-\left(\sum x\right)\left(\sum y\right)}{n\sum x^{2}-\left(\sum x\right)^{2}}\end{array}

= (4 (230) - 50 (44)) / (4 (150) - 2500)
= -1280/-1900
= 0.673

Comment

Explore