**Introduction to Bias and Variance**

**Bias **and **Varianc**e plays a very important role while building a model. To frame it in simple terms **Bias** is interpreted as the **model error encountered for the training data **and **Variance **is interpreted as the **model error encountered for the test data.**

To understand the concept of **Bias **and **Variance **we need to know what **Overfitting **and **Underfitting **is and how do we work towards rectifying it.

But wait!! What exactly does **Overfitting **and **Underfitting **mean and how does it matter so much?

Let’s break these down into steps and understand them by taking different scenarios for both **Regression **and **Classification **problems.

__Regression__

__Regression__

__Case -I (Degree of Polynomial =1)__

__Case -I (Degree of Polynomial =1)__

Let’s take the above example where we have created a best fit line using a **Polynomial Linear Regression **algorithm. In the above example we have taken the **degree of polynomial =1.**

Example of an equation with **degree of polynomial =1 **is **mx + b**. This kind of models are similar to **simple Linear regression model **and they will create a **straight best fit line **as shown above.

Now if you notice above, most of the points are **non-linearly** spread across the region, and clearly the **linear best fit line** won’t be able to fit those points properly. Based on the above model, if we try to calculate the **R-Squared** **error** (check out this post for R-Squared matrix), we will get a higher value. If we take the summation of all distances between the actual and predicted points, it will be a high value.

__Underfitting__

__Underfitting__

In this case, we created a model on the **training dataset**, but we are getting very high error on both the **training** and **test/actual data**. That means our model doesn’t fit the data. This scenario is called **Underfitting.**

In case of **Underfitting **accuracy is very low for both **training** and **test** data.

__Bias and Variance__

__Bias and Variance__

Let’s understand what **Bias **and **Variance **means.

**Bias** is interpreted as the **model error encountered for the training data **and **Variance **is interpreted as the **model** **error encountered for the test data.**

In this case, the error for both **training** and **test** data is high which clearly means that our model has **High Bias **and **High Variance.**

**Case -II (Degree of Polynomial = 3)**

**Case -II (Degree of Polynomial = 3)**

Now let’s increase the **degree of polynomial **to **3, **you will notice that we will get a better **best fit line **which fits the points better than the previous case.

If you notice above, the **best fit line **is a curve which tries to fit the points in a better way thereby satisfying most of the training data points. Therefore, the **R-squared **error is comparatively lesser in this case.

__Bias and Variance__

__Bias and Variance__

This will be a perfect model as this has **low training **and **testing error.** This means that it has **Low Bias **and **Low Variance.**

__Case -III (Degree of Polynomial = n)__

__Case -III (Degree of Polynomial = n)__

Now let’s increase the **degree of polynomial **to **n, **you will notice that we will get a **best fit line **which fits almost all the points.

__Overfitting__

__Overfitting__

In this case each point is being fitted perfectly by the **best fit line. **This is a perfect example of **Overfitting.**

In case of **Overfitting, **the model’s accuracy is very high for **training data **but very low for **test data. **A good model should have high accuracy for both **training **and **test **data.

__Bias and Variance__

__Bias and Variance__

In this case, the error is **low **for **training** data but **high** for **test** data which clearly means that our model has **Low Bias **and **High Variance.**

__Classification__

__Classification__

__Case -I__

__Case -I__

Let’s take a classification use case where the **training and testing error **for the model is **high.**

**Training error = ~35%**

**Testing error = ~37%**

This has **high bias **and **high variance **which clearly shows that it is a case of **Underfitting.**

__Case -II__

__Case -II__

Let’s take a classification use case where the **training and testing error **for the model is **low.**

**Training error = <5%**

**Testing error = <5%**

This has **low bias **and **low variance **which clearly says that this is our most generalized model**.**

__Case -III__

__Case -III__

Let’s take a classification use case where we have a **low** **training error, **but **high** **testing error.**

**Training error = <5%**

**Testing error = ~35%**

This has **low bias **and **high variance **which clearly shows that it is a case of **Overfitting.**

Now that we have understood different scenarios of **Classification **and **Regression **cases with respect to **Bias **and **Variance**, let’s see a more generalized representation of **Bias **and **Variance.**

__Generalized representation of Bias and Variance__

__Generalized representation of Bias and Variance__

Let’s consider model error as the y-axis and the degree of polynomial as the x-axis.

__Underfitting Scenario__

__Underfitting Scenario__

Now let’s say that we have an **Underfitting condition. **Generally, in this case, the **degree of polynomial** will be less, and the **error** will be pretty high for both the **Training **and **Testing **data signifying the **high bias **and **high variance. **This can be seen in the graph below.

__Overfitting Scenario__

__Overfitting Scenario__

In case of **Overfitting Condition **the **degree of polynomial** will be high, and the **error** will be low for the **Training data **and pretty high for the **Testing **data signifying the **low bias **and **high variance. **

__Generalized Scenario__

__Generalized Scenario__

If we notice the graph, as the degree of polynomial increases the error for both the training and testing data gradually descends. But after a certain point the testing error just skyrockets. That’s called the high variance for certain degree of polynomial where the training data was a perfect fit **(Overfitting).**

The basic aim is to find a model which will have a generalized fit for both the training and test data. The Oval circle encompassing both the training and test data represents the point which would be suitable for a generalized model. This represents **low bias **and **low variance.**

Now let’s go ahead and see these scenarios of **Overfitting **and **Underfitting **with respect to **Decision Tree **and **Random Forest.**

__Decision Tree__

__Decision Tree__

Now let’s say, we have the above decision tree which has been split to its depth based on its features. This splitting of the tree to its complete depth is like a scenario of **Overfitting **condition. This will give us a very good training result, but for test data this will not be as accurate. So, we can say that decision tree has a condition of **low bias **and **high variance.**

To mitigate such cases, we use methods like **decision pruning** where we create the decision tree up to some certain depth to avoid **Overfitting. **This will help us convert the **high variance **to **low variance.**

There are also many hyperparameter techniques used in it to tune it better. We will learn those in detail in further posts related to decision trees.

__Random Forest__

__Random Forest__

In case of **Random Forest, **we use multiple **decision trees **in parallel. The **low bias **and **high variance** property will be there of the decision tree will be there, but as we are using multiple **decision trees **in parallel, the **high variance **would be converted to **low variance.**

In the figure below, you could see we have 60K records and they have been split into individual decision trees. **Random forest **algorithm aggregates the output of all the various **decision trees **to display us the aggregated outcome.