outlier detection

Outliers Another problem that we often face in data are the outliers. They are the one-off values which always stand out from the population. They may be very large or very small with respect to the entire population of the data. outlier detection is a very important and crucial step in Exploratory data analysis.  outlier detection…Read more

Multicollinearity with Ordinary Least Squares(OLS)

Introduction Ordinary Least Squares  is a method which helps us estimate the unknown parameters in the Linear regression model. How does it estimate the parameters though? Well, it estimates the parameters by minimizing the sum of squared residuals. The way it does is , it draws a line through the data points such that the squared…Read more

Multicollinearity using VIF

Introduction Collinearity is a condition in the data where we have 2 features which are heavily correlated with each other. In such situations, we could check the Collinearity using a heat map and then omit one of the features based on the results. Multicollinearity on the other hand is a more complicated problem to solve. In Multicollinearity, chances…Read more

Chi-Squared Test of Independence

Introduction Chi-Squared Test of Independence determines the association between categorical variables. This means that it says whether the variables are related to each other or independent. It’s also called Chi-Square Test of Association. The Chi-Squared Test uses a contingency table to determine the association. The contingency table contains the data which is classified according to…Read more

Hypothesis testing using T-Test

Introduction A T-test is a type of inferential statistic which is used to determine if there is a significant difference between the means of two groups which may be related in certain features. Types of T-test are: One sample T-test Two sample T-Test Paired T-Test Reference: https://github.com/krishnaik06/T-test-an-Correlation-using-python/blob/master/Hypothesis_Testing.ipynb https://www.tutorialspoint.com/scipy/scipy_stats.htm https://en.wikipedia.org/wiki/Student%27s_t-test

R-Squared and Adjusted R-Squared

Introduction R-Squared and Adjusted R-Squared are the key techniques to check the accuracy for a Regression problem. We will understand each in detail in the subsequent sections. There are various techniques to check the accuracy of different problems. In case of classification problems, we use the confusion matrix, F1-Score, Precision, Recall etc. You can check…Read more

Bias and Variance in Machine Learning

Introduction to Bias and Variance Bias and Variance plays a very important role while building a model. To frame it in simple terms Bias is interpreted as the model error encountered for the training data and Variance is interpreted as the model error encountered for the test data. To understand the concept of Bias and…Read more

Ridge and Lasso Regression

Introduction In this post we will try to understand about regularization and hyperparameter-tuning using Ridge and Lasso Regression. Before that we need to understand few concepts of Linear Regression. I will provide a brief explanation here which would suffice our motive of this topic, however if you want to get a more in-depth understanding of…Read more

Math behind Simple Linear Regression

Simple Linear Regression In this post we will try to understand the Math behind Simple Linear Regression. But before getting into the details let’s understand what Simple Linear Regression means. Simple Linear Regression basically defines relationship between one feature and a continuous outcome variable/ dependent variable y = α +βx. This equation is similar to the…Read more