**Introduction**

In this post we will get into detail of understanding **Z-Score **and what are its application with respect to Gaussian/Normal distribution . We will also discuss about **Quantiles **and implement it to see how a particular distribution is divided into different **Quantiles**.

If we try to understand about **Z-score **in layman language, then it basically shows about how far is a data point away from the mean.

If we try to understand it in a more technical way, then it states how many **standard deviations above **or **below **the **mean **is a particular value present.

The curve shown above is a **Gaussian or Normal Distribution **curve. The central portion of the curve is the **Mean.**

The portion of the curve that is **one standard deviation away from the mean **both on the left and right covers 68.16% of the portion. Similarly, the portion with **two standard deviations away both on the left and right **covers 95.44% of the portion and the portion with **three standard deviations away both on the left and right **covers 99.73% of the portion of the curve. This is basically the empirical formula for **Gaussian Normal Distribution.**

Now let’s take a **Standard Normal distribution **as shown above, which has **mean **as zero and **standard deviation **as 1. So, in that case a **Z-score** of +1 says that we are 1 standard deviation above the mean. If the it is +2 then we are 2 standard deviations above the mean and so on.

Similarly, for a Z-score of -1, says that we are 1 standard deviations below the mean.

Z-score of -2, says that we are 2 standard deviations below the mean and so on.

__Z-Score formula__

The **Z-score **formula for a sample would be as follows:

Where

**x**= score,**µ**= mean of the population,**σ**= Population Standard deviation

Now, let’s take an example to understand this concept better. Suppose we are considering the heights of student in a class. Let’s say the **mean **height of the students are **150 cm **and the **standard deviation** is **10** and we have to find the probability of students who have heights greater than **165 cm** –**P (height >165cm)**

Let’s see how the distribution looks in the below normal distribution curve:

So, if you take the above curve and try to map it out to a **standard normal distribution curve **then the value of 165 cm would fall 1.5 standard deviations above the mean.

The reason being, 150 is the mean, so 1 SD above the mean would be 160 and 2 SD above the mean would be 170, so 165 would be 1.5 SD above mean.

Using the **Z-score **table below, if we see the score for **z **which is 1.5 in our case, the corresponding value for that 0.9332 (circled in red) which means that the region of the curve which is less than 165 cm is 93.32% of the whole curve as shown below.

As the complete **standard normal distribution **would cover 100% of the area, so, the portion for which the **P (height >165cm) **would be 100 – 93.32 = 6.68 %

The probability of students in the class whose height is more than 165cm is around 6.68%

__Z-score for One Sample__

__Z-score for One Sample__

In the above example, we had considered the complete population.

To calculate the **z-score **for **one** **sample** as well. The formula for that also remains the same:

Where,

**x**= sample score,_{s}**µ**= sample mean,_{s }**σ**= sample standard deviation_{s}

The process for solving the **z-score **remains the same for samples.

Now that we have seen, the formula for calculating Z-score for **one sample, **let’s go ahead and understand how we could do this when we have **multiple samples.**

__Z-score for Multiple Samples__

__Z-score for Multiple Samples__

The below formula would give the **z-score** when we have **multiple samples.**

Where,

**x**is the sample mean,**µ**is the population mean,**σ**is the standard deviation,**n**is the number of samples.

Let’s take an example: The mean weight of students in a class is 150lbs with standard deviation of 3.0. What will be the probability of finding a random sample of 60 students with mean weight of 170lbs assuming the height is normally distributed.

**x**=170,**µ**= 150,**σ**=3,**n**=60

So, as we are dealing with the sampling distribution of means, we had to include the standard error in the formula while calculating the **z-score.**

Another thing that we need to keep in mind here as well is the empirical formula that we discussed earlier.

According to the empirical formula, 99.73% of the values will fall under 3 standard deviations from the mean in a normal distribution and since our **z-score **value is 51.28 which means it is 51.28 standard deviations away from the mean, it shows that there are less than 1% probability that any sample of students will have mean weight is 170lbs.

Now that we have understood the concept of** Z-score**, let’s go ahead and see how we could implement it to detect **Outliers**.

We would also see another method of detecting **Outliers **which is through Using **Quantiles**.