Introduction
Collinearity is a condition in the data where we have 2 features which are heavily correlated with each other. In such situations, we could check the Collinearity using a heat map and then omit one of the features based on the results. Multicollinearity on the other hand is a more complicated problem to solve. In Multicollinearity, chances are multiple features will be correlated to one feature. This condition makes it difficult to remove the problem of Multicollinearity in case of linear regression. In this post we will discuss about Variance Inflation Factor(VIF) which deals with Multicollinearity.
What is Variance Inflation factor (VIF)
Variance Inflation factor (VIF) basically quantifies how much the variance is inflated. Here Variance is referred to as the standard error. Refer this post for more details on Variance. Thus, the variances — of the estimated coefficients are inflated when multicollinearity exists. We have the Variance Inflation factor (VIF) for each of the predictors in a multiple regression model. For example, the Variance Inflation factor (VIF) for the estimated regression coefficient bi —denoted VIFi —is just the factor by which the variance of bi is “inflated” by the existence of correlation among the predictor variables in the model.
The formula for the Variance Inflation factor (VIF) for the jth predictor is:
where, R2i is the R2-value obtained by regressing the ith predictor on the remaining predictors.
Now let’s get into the code and see how we could implement this:
References:
- https://online.stat.psu.edu/stat462/node/180/
- https://www.youtube.com/watch?v=0SBIXgPVex8
- https://www.youtube.com/watch?v=qmt7ZZoiDwc
- https://en.wikipedia.org/wiki/Variance_inflation_factor