Exploratory Data Analysis

Different types of Encoding

Encoding is a technique of converting categorical variables into numerical values so that it could be easily fitted to a machine learning model. Before getting into the details, let’s understand about the different types of categorical variables. Nominal categorical variable: Nominal categorical variables are those for which we do not have to worry about the…Read more

Handling Numerical Data using StandardScaler

In real life, values in a dataset might have a variety of different magnitudes, ranges, or scales. Algorithms that use distance as a parameter may not weigh all these in the same way. There are various data transformation techniques that are used to transform the features of our data so that they use the same…Read more

Categorical Encoding using One-Hot Encoding

Handling Categorical Data – One-Hot Encoding In label encoding, categorical data is converted to numerical data, and the values are assigned labels (such as 1, 2, and 3). But there is a flaw here, Predictive models that use this numerical data for analysis might sometimes mistake these labels for some kind of order (for example,…Read more

Data Transformation

Data Transformation Data Transformation is the technique of converting data from one format to another. Data Transformation can be divided into following steps. Each of these steps will be applied based on the complexity of the transformation. Data Discovery: This is more of an exploratory step which involves profiling the data using data profiling tools…Read more

Categorical Encoding using Label Encoding

Handling Categorical Data — Label Encoding Usually in Machine learning we encounter data which have multiple labels in one or multiple columns. These labels can be characters or numeric form. These kind of data cannot be fed in the raw format to a Machine Learning model. To make the data understandable for the model, it…Read more

Data Integration

Data Integration Data Integration is a technique of integrating the data which resides in different sources. The goal  is to provide the users with a holistic view of the data. It can be viewed more as a practice of consolidating data from various disparate sources. This is viewed as one of the most important steps…Read more

outlier detection

Outliers Another problem that we often face in data are the outliers. They are the one-off values which always stand out from the population. They may be very large or very small with respect to the entire population of the data. outlier detection is a very important and crucial step in Exploratory data analysis.  outlier detection…Read more

Multicollinearity using VIF

Introduction Collinearity is a condition in the data where we have 2 features which are heavily correlated with each other. In such situations, we could check the Collinearity using a heat map and then omit one of the features based on the results. Multicollinearity on the other hand is a more complicated problem to solve. In Multicollinearity, chances…Read more

EDA using Probability Density Function and Cumulative Distribution Function

Introduction In this post, we will discuss about 2 very important topics and how it helps in Exploratory data analysis — Probability Density Function and Cumulative Density Function. A continuous random variable distribution can be characterized through its Probability Distribution Function. We will understand this statement in greater detail in the subsequent section. Cumulative Density Function…Read more