__Introduction__

__Introduction__

**R-Squared **and **Adjusted R-Squared** are the key techniques to check the accuracy for a Regression problem. We will understand each in detail in the subsequent sections.

There are various techniques to check the accuracy of different problems. In case of classification problems, we use the confusion matrix, F1-Score, Precision, Recall etc. You can check the detailed post for Classification performance matrix **here**.

__R-Squared__

__R-Squared__

The formula for R-Squared is:

Where,

{{SS}}_{{res}} = Sum of residual

{{SS}}_{{tot}} = Sum of average total

**Best Fit Line**

To understand what {{SS}}_{{res}} is, let’s take a graph to understand that.

The blue dots in the graph are the **actual **points. The double ended arrow between each blue dot and the diagonal line (best fit line)shows the difference between the **predicted** and **actual** point. This is the **error**/**residual**. The summation of all these differences between the **actual **and the **predicted **points is what we call as {{SS}}_{{res}}

{{SS}}_{{res}} =\sum\left({y}_{i}-{\hat{{y}}}_{i}\right)^\mathbf{2}

Here ** **are the actual points and ** **are the predicted points.

Now let’s understand using the graph shown below.

**Average Fit Line**

In the above figure, you can see that instead of finding the **best fit line, **the **average output line **is** **taken**. **The blue dots in the graph are the **actual **points.

The double ended arrow between each blue dot and the **average output line **gives the difference between the **predicted** and **actual** point. The summation of all these differences between the **actual **and the **predicted **points is what we call as {{SS}}_{tot}

{{SS}}_{tot}\ = \sum\left({y}_{i}-{\hat{{y}}}_{average}\right)^\mathbf{2}

So, substituting the value of {{SS}}_{{res}} and {{SS}}_{{tot}} in the **R ^{2 }**equation, we will get a value somewhere between 0 and 1.

\mathbf{R}^\mathbf{2}\ =\mathbf{1}\ -\ \frac{{\mathbf{SS}}_{\mathbf{res}}}{{\mathbf{SS}}_{\mathbf{tot}}}\ =1\ -\ \frac{value\ of\ {\mathbf{SS}}_{\mathbf{res}}<{\mathbf{SS}}_{\mathbf{tot}}}{{\mathbf{SS}}_{\mathbf{tot}}}\ = 0<value\ of\ \mathbf{R}^\mathbf{2}<1

The logic behind this is, the error in {{SS}}_{{tot}} will always be higher as we are taking an average fit.

Whereas the error for {{SS}}_{{res}} will be comparatively lower than {{SS}}_{{tot}} making it a smaller value.

Therefore, \frac{{\mathbf{SS}}_{\mathbf{res}}}{{\mathbf{SS}}_{\mathbf{tot}}} will be a smaller value. Subtracting this from 1 will give us a value somewhere between 0 and 1.

If the **R ^{2 }**value is nearer to 1, then our best fit line has fitted to the model quite well.

**But wait!! Can we encounter a scenario where the R ^{2} value is less than 0?**

Yes, the value for **R ^{2 }**can be less than 0 in cases where the output of the

**best fit line**is worse than the

**average output line**. That means {{SS}}_{res}\ >{{SS}}_{tot}

Substituting the values to the **R ^{2 }**equation below:

\mathbf{R}^\mathbf{2}\ =\ \mathbf{1}\ -\ \frac{{\mathbf{SS}}_{\mathbf{res}}}{{\mathbf{SS}}_{\mathbf{tot}}}\ =\ 1\ -\ \frac{large\ value}{smaller\ value}\ =-ve\ value

This means that the model that we have created is not at all a good model. Therefore **R ^{2}** is used to check the

**goodness of fit**.

__Drawback of R__^{2}

__Drawback of R__

^{2}There is a drawback to **R ^{2 }**which often makes it difficult to predict the accuracy of the model.

Let’s say we have a **simple linear regression** model which has one independent feature and has the equation **y = ax + b. **Now we add few more independent features to the model. Our new equation would be a **multiple linear regression** model with an equation somewhat like **y = ax _{1 }+ bx_{2 }+ cx_{3 }+ d.**

So, as the number of independent feature increases, our **R ^{2}** also increases.

**How does the value of R**^{2} increase?

^{2}increase?

Every time when we add an independent feature, the linear regression algorithm adds a coefficient value to the feature. Ex., The coefficients for the above equation are **a, b, c **which got added when new features **x _{1}**,

**x**,

_{2}**x**were introduced to the model.

_{3}The Linear regression algorithm assigns the coefficients in such a way that the value of {{SS}}_{res} will always be decreasing, whenever we add a new independent feature.

If we substitute this logic to **R ^{2 }**equation:

\mathbf{R}^\mathbf{2}\ =\mathbf{1}\ -\ \frac{{\mathbf{SS}}_{\mathbf{res}}}{{\mathbf{SS}}_{\mathbf{tot}}}\ =1\ -\ \frac{Decreasing\ with\ increasing\ independent\ features}{{\mathbf{SS}}_{\mathbf{tot}}(constant\ greater\ value)}\ =close\ to\ 1

**This sounds perfect right, Not really!!**

As we increase the number of independent features in the model, the **R ^{2}** value will also keep on increasing even though the independent features are not co-related with the dependent variable.

Chances are the feature that we include can be a complete one-off. It might not have any relation with the target dependent variable, but still has some coefficient value contributing to the output. Linear regression algorithm works in such a way that it adds a coefficient value to every feature that is present in the model.

Ex. suppose we are predicting the age of students in which our model might have one of the features as the **contact number **of the students. This feature seems to have no correlation with the age, but still might have some coefficient value contributing to the output thereby increasing the overall **R ^{2 }**of the model.

This clearly means that **R ^{2}** doesn’t have anything to do with the correlation between the independent features and the dependent variable. It simply increases whenever we add a new feature to the model.

To prevent such scenarios, we use **Adjusted R ^{2}**

__Adjusted R – Squared__

__Adjusted R – Squared__

The formula for **Adjusted R-Squared **is as follows

{Adjusted\ -\ R}^{2\ }=\ 1\ -\ (1-R^2)\ \frac{(N-1)}{N-P-1}

Where,

R^2\ =\ R\ -\ squared\ valueP\ =\ independent\ features

N\ =\ Sample\ size\ of\ the\ dataset.

The **Adjusted – R ^{2} **has a penalizing factor. It penalizes for adding independent variable that don’t contribute to the model in any way or are not correlated to the dependent variable.

To understand this penalizing factor let’s divide it into 2 cases:

__Case – I__

__Case – I__

Let’s say we increase the number of independent features (**P**) for the model. These features are **not really contributing to the model much or are not correlated to the dependent variable**.

Let’s substitute this logic to the **Adjusted R-Squared **equation. The value for **N-P-1** will decrease as the value for **P **has increased. Thus, the value for \frac{(\mathbf{N}-\mathbf{1})}{\mathbf{N}-\mathbf{P}-\mathbf{1}} will increase.

Now there is one thing that we need to understand here. As we add new features, it is obvious that the **R-Squared **value will increase. But this increase will be insignificant in comparison to the \ \frac{(\mathbf{N}-\mathbf{1})}{\mathbf{N}-\mathbf{P}-\mathbf{1}} value because the newly added features are not correlated to the dependent variable. So, (1-R^2) will not decrease much.

Now the value for \ \frac{(\mathbf{N}-\mathbf{1})}{\mathbf{N}-\mathbf{P}-\mathbf{1}} multiplied with (1-R^2) will also not be a decreased value.

Finally, on subtracting it from 1 will give us a smaller value

{\mathbf{Adjusted}\ -\ \mathbf{R}}^{\mathbf{2}\ }=\ \mathbf{1}\ -\ (\mathbf{1}-\mathbf{R}^\mathbf{2})\ \frac{(\mathbf{N}-\mathbf{1})}{\mathbf{N}-\mathbf{P}-\mathbf{1}}

=\mathbf{1}-(\mathbf{increasing}\ \mathbf{value}\ \mathbf{less}\ \mathbf{than}\ \mathbf{1})

=\mathbf{smaller}\ \mathbf{value}

This is how **Adjusted R-Squared **penalizes when the features are not correlated to the dependent variable.

__Case – II__

__Case – II__

Now let’s say we are adding features which are **very much correlated to the dependent variable**. In this case the **R ^{2 }**will be higher and will overwhelm the \ \frac{(\mathbf{N}-\mathbf{1})}{\mathbf{N}-\mathbf{P}-\mathbf{1}} value.

So, **(1 – R ^{2}) **will be a smaller value which multiplied with an overwhelming \ \frac{(\mathbf{N}-\mathbf{1})}{\mathbf{N}-\mathbf{P}-\mathbf{1}} value will give us a smaller value. Now subtracting this from 1 would give us

**Adjusted R-Squared**which is an increased value compared to the previous case.

Substituting this logic to the **Adjusted R-Squared **equation

{\mathbf{Adjusted}\ -\ \mathbf{R}}^{\mathbf{2}\ }=\ \mathbf{1}\ -\ (\mathbf{1}-\mathbf{R}^\mathbf{2})\ \frac{(\mathbf{N}-\mathbf{1})}{\mathbf{N}-\mathbf{P}-\mathbf{1}}

=1\ -\ (smaller\ value)(Overwhelmed\ value)

=\ \mathbf{1}\ -\ \mathbf{smaller}\ \mathbf{value}\ =\ \mathbf{Increased}\ \mathbf{Adjusted}\ \mathbf{R}^\mathbf{2}\ \mathbf{value}

So, this signifies that, when the independent features are correlated to the dependent variable, the Adjusted R- Squared value goes up.

__Conclusion__

__Conclusion__

- Whenever we add an independent feature to the model, the
**R-squared**value will always increase, even if the independent feature is not correlated to the dependent variable. It will never decrease.On the other hand,**Adjusted R- Squared**increases only when the independent feature is correlated to the dependent variable. - The value for
**Adjusted R-Squared**will always be less than or equal to**R-squared**value.

**Reference** :

- https://discuss.analyticsvidhya.com/t/difference-between-r-square-and-adjusted-r-square/264
- https://www.statisticshowto.com/adjusted-r2/
- https://blog.minitab.com/blog/adventures-in-statistics-2/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables
- https://www.youtube.com/watch?v=2AQKmw14mHM