Encoding is a technique of converting categorical variables into numerical values so that it could be easily fitted to a machine learning model. Before getting into the details, let’s understand about the different types of categorical variables. Nominal categorical variable: Nominal categorical variables are those for which we do not have to worry about the…Read more

# Exploratory Data Analysis

## Handling Numerical Data using StandardScaler

In real life, values in a dataset might have a variety of different magnitudes, ranges, or scales. Algorithms that use distance as a parameter may not weigh all these in the same way. There are various data transformation techniques that are used to transform the features of our data so that they use the same…Read more

## Categorical Encoding using One-Hot Encoding

Handling Categorical Data – One-Hot Encoding In label encoding, categorical data is converted to numerical data, and the values are assigned labels (such as 1, 2, and 3). But there is a flaw here, Predictive models that use this numerical data for analysis might sometimes mistake these labels for some kind of order (for example,…Read more

## Data Transformation

Data Transformation Data Transformation is the technique of converting data from one format to another. Data Transformation can be divided into following steps. Each of these steps will be applied based on the complexity of the transformation. Data Discovery: This is more of an exploratory step which involves profiling the data using data profiling tools…Read more

## Categorical Encoding using Label Encoding

Handling Categorical Data — Label Encoding Usually in Machine learning we encounter data which have multiple labels in one or multiple columns. These labels can be characters or numeric form. These kind of data cannot be fed in the raw format to a Machine Learning model. To make the data understandable for the model, it…Read more

## Data Integration

Data Integration Data Integration is a technique of integrating the data which resides in different sources. The goal is to provide the users with a holistic view of the data. It can be viewed more as a practice of consolidating data from various disparate sources. This is viewed as one of the most important steps…Read more

## outlier detection

Outliers Another problem that we often face in data are the outliers. They are the one-off values which always stand out from the population. They may be very large or very small with respect to the entire population of the data. outlier detection is a very important and crucial step in Exploratory data analysis. outlier detection…Read more

## Multicollinearity using VIF

Introduction Collinearity is a condition in the data where we have 2 features which are heavily correlated with each other. In such situations, we could check the Collinearity using a heat map and then omit one of the features based on the results. Multicollinearity on the other hand is a more complicated problem to solve. In Multicollinearity, chances…Read more

## Cross Validation techniques and its applications

Introduction Before getting into the details of Cross Validation techniques and its application, we will see what the steps in a Machine Learning Pipeline are. This will help us to better visualize the purpose of doing Cross Validation. To understand Cross Validation, we need to know couple of things that are involved in model creation….Read more

## EDA using Probability Density Function and Cumulative Distribution Function

Introduction In this post, we will discuss about 2 very important topics and how it helps in Exploratory data analysis — Probability Density Function and Cumulative Density Function. A continuous random variable distribution can be characterized through itsÂ Probability Distribution Function. We will understand this statement in greater detail in the subsequent section. Cumulative Density Function…Read more