## outlier detection

Outliers Another problem that we often face in data are the outliers. They are the one-off values which always stand out from the population. They may be very large or very small with respect to the entire population of the data. outlier detection is a very important and crucial step in Exploratory data analysis.  outlier detection…

## Multicollinearity with Ordinary Least Squares(OLS)

Introduction Ordinary Least Squares  is a method which helps us estimate the unknown parameters in the Linear regression model. How does it estimate the parameters though? Well, it estimates the parameters by minimizing the sum of squared residuals. The way it does is , it draws a line through the data points such that the squared…

## Multicollinearity using VIF Introduction Collinearity is a condition in the data where we have 2 features which are heavily correlated with each other. In such situations, we could check the Collinearity using a heat map and then omit one of the features based on the results. Multicollinearity on the other hand is a more complicated problem to solve. In Multicollinearity, chances…

## Chi-Squared Test of Independence Introduction Chi-Squared Test of Independence determines the association between categorical variables. This means that it says whether the variables are related to each other or independent. It’s also called Chi-Square Test of Association. The Chi-Squared Test uses a contingency table to determine the association. The contingency table contains the data which is classified according to…

## Hypothesis testing using T-Test Introduction A T-test is a type of inferential statistic which is used to determine if there is a significant difference between the means of two groups which may be related in certain features. Types of T-test are: One sample T-test Two sample T-Test Paired T-Test Reference: https://github.com/krishnaik06/T-test-an-Correlation-using-python/blob/master/Hypothesis_Testing.ipynb https://www.tutorialspoint.com/scipy/scipy_stats.htm https://en.wikipedia.org/wiki/Student%27s_t-test Introduction R-Squared and Adjusted R-Squared are the key techniques to check the accuracy for a Regression problem. We will understand each in detail in the subsequent sections. There are various techniques to check the accuracy of different problems. In case of classification problems, we use the confusion matrix, F1-Score, Precision, Recall etc. You can check…

## Bias and Variance in Machine Learning Introduction to Bias and Variance Bias and Variance plays a very important role while building a model. To frame it in simple terms Bias is interpreted as the model error encountered for the training data and Variance is interpreted as the model error encountered for the test data. To understand the concept of Bias and…

## Ridge and Lasso Regression Introduction In this post we will try to understand about regularization and hyperparameter-tuning using Ridge and Lasso Regression. Before that we need to understand few concepts of Linear Regression. I will provide a brief explanation here which would suffice our motive of this topic, however if you want to get a more in-depth understanding of…

## Cross Validation techniques and its applications Introduction Before getting into the details of Cross Validation techniques and its application, we will see what the steps in a Machine Learning Pipeline are. This will help us to better visualize the purpose of doing Cross Validation. To understand Cross Validation, we need to know couple of things that are involved in model creation….

## Math behind Simple Linear Regression Simple Linear Regression In this post we will try to understand the Math behind Simple Linear Regression. But before getting into the details let’s understand what Simple Linear Regression means. Simple Linear Regression basically defines relationship between one feature and a continuous outcome variable/ dependent variable y = α +βx. This equation is similar to the…