Outliers Another problem that we often face in data are the outliers. They are the one-off values which always stand out from the population. They may be very large or very small with respect to the entire population of the data. outlier detection is a very important and crucial step in Exploratory data analysis. outlier detection…Read more
Multicollinearity with Ordinary Least Squares(OLS)
Introduction Ordinary Least Squares is a method which helps us estimate the unknown parameters in the Linear regression model. How does it estimate the parameters though? Well, it estimates the parameters by minimizing the sum of squared residuals. The way it does is , it draws a line through the data points such that the squared…Read more
Multicollinearity using VIF

Introduction Collinearity is a condition in the data where we have 2 features which are heavily correlated with each other. In such situations, we could check the Collinearity using a heat map and then omit one of the features based on the results. Multicollinearity on the other hand is a more complicated problem to solve. In Multicollinearity, chances…Read more
Chi-Squared Test of Independence

Introduction Chi-Squared Test of Independence determines the association between categorical variables. This means that it says whether the variables are related to each other or independent. It’s also called Chi-Square Test of Association. The Chi-Squared Test uses a contingency table to determine the association. The contingency table contains the data which is classified according to…Read more
Hypothesis testing using T-Test

Introduction A T-test is a type of inferential statistic which is used to determine if there is a significant difference between the means of two groups which may be related in certain features. Types of T-test are: One sample T-test Two sample T-Test Paired T-Test Reference: https://github.com/krishnaik06/T-test-an-Correlation-using-python/blob/master/Hypothesis_Testing.ipynb https://www.tutorialspoint.com/scipy/scipy_stats.htm https://en.wikipedia.org/wiki/Student%27s_t-test
R-Squared and Adjusted R-Squared

Introduction R-Squared and Adjusted R-Squared are the key techniques to check the accuracy for a Regression problem. We will understand each in detail in the subsequent sections. There are various techniques to check the accuracy of different problems. In case of classification problems, we use the confusion matrix, F1-Score, Precision, Recall etc. You can check…Read more
Bias and Variance in Machine Learning

Introduction to Bias and Variance Bias and Variance plays a very important role while building a model. To frame it in simple terms Bias is interpreted as the model error encountered for the training data and Variance is interpreted as the model error encountered for the test data. To understand the concept of Bias and…Read more
Ridge and Lasso Regression

Introduction In this post we will try to understand about regularization and hyperparameter-tuning using Ridge and Lasso Regression. Before that we need to understand few concepts of Linear Regression. I will provide a brief explanation here which would suffice our motive of this topic, however if you want to get a more in-depth understanding of…Read more
Cross Validation techniques and its applications

Introduction Before getting into the details of Cross Validation techniques and its application, we will see what the steps in a Machine Learning Pipeline are. This will help us to better visualize the purpose of doing Cross Validation. To understand Cross Validation, we need to know couple of things that are involved in model creation….Read more
Math behind Simple Linear Regression

Simple Linear Regression In this post we will try to understand the Math behind Simple Linear Regression. But before getting into the details let’s understand what Simple Linear Regression means. Simple Linear Regression basically defines relationship between one feature and a continuous outcome variable/ dependent variable y = α +βx. This equation is similar to the…Read more