# Chi Squared Test¶

1. Define Null and Alternate Hypothesis
2. State Alpha
3. Calculate Degree of Freedom
4. State Decision rule
5. Calculate the chi-square test statistic
6. Calculate the critical value
7. State Results and conclusion
In :
import scipy.stats as stats
import pandas as pd
import numpy as np

In :
df = pd.read_csv("Placement.csv")

In :
df.head()

Out:
sl_no gender ssc_p ssc_b hsc_p hsc_b hsc_s degree_p degree_t workex etest_p specialisation mba_p status salary
0 1 M 67.00 Others 91.00 Others Commerce 58.00 Sci&Tech No 55.0 Mkt&HR 58.80 Placed 270000.0
1 2 M 79.33 Central 78.33 Others Science 77.48 Sci&Tech Yes 86.5 Mkt&Fin 66.28 Placed 200000.0
2 3 M 65.00 Central 68.00 Central Arts 64.00 Comm&Mgmt No 75.0 Mkt&Fin 57.80 Placed 250000.0
3 4 M 56.00 Central 52.00 Central Science 52.00 Sci&Tech No 66.0 Mkt&HR 59.43 Not Placed NaN
4 5 M 85.80 Central 73.60 Central Commerce 73.30 Comm&Mgmt No 96.8 Mkt&Fin 55.50 Placed 425000.0

### 1. Define Null and Alternate hypothesis¶

Null Hypothesis (H0) = There is no relationship between the 2 categorical variables

Alternate Hypothesis (H1) = There is a relationship between the 2 categorical variables

In :
df_crosstab = pd.crosstab(df['specialisation'],df['status'])

In :
df_crosstab

Out:
status Not Placed Placed
specialisation
Mkt&Fin 25 95
Mkt&HR 42 53
In :
df_crosstab.values

Out:
array([[25, 95],
[42, 53]], dtype=int64)
In :
observed_values = df_crosstab.values
print("Observed values: ",observed_values)

Observed values:  [[25 95]
[42 53]]

In :
test_dependence = stats.chi2_contingency(observed_values)

In :
test_dependence

Out:
(12.440229009203623,
0.00042018425858864284,
1,
array([[37.39534884, 82.60465116],
[29.60465116, 65.39534884]]))
In :
Expected_value = test_dependence

In :
print("Expected value: ",Expected_value)

Expected value:  [[37.39534884 82.60465116]
[29.60465116 65.39534884]]


### 2. State Alpha¶

In :
alpha = 0.05


### 3. State the degree of freedom¶

In :
rows = len(df_crosstab.iloc[0:2,0])
columns = len(df_crosstab.iloc[0,0:2])
degree_of_freedom = (rows-1)*(columns-1)
print("Degree of Freedom: ",degree_of_freedom)

Degree of Freedom:  1


### 4. State the decision rule¶

1. If the chi-square statistic is greater than or equal to the critical value, then reject the null hypothesis.
2. If the P-value is less than or equal to alpha, reject the null hypothesis

### 5. Calculate the Chi-square test statistic¶ In :
from scipy.stats import chi2
test_statistic = sum([(O-E)**2/E for O,E in zip(observed_values,Expected_value)])
chi_Square_test_statistic = test_statistic+test_statistic

In :
print("The value of chi-squared test statistic: ",chi_Square_test_statistic)

The value of chi-squared test statistic:  13.508014470676486


### 6. Calculate the critical value¶

In :
critical_value = stats.chi2.ppf(q = 1-alpha,df = degree_of_freedom)
print("critical value: ",critical_value)

critical value:  3.841458820694124


## Alternate Method using p-value¶

In :
p_value = 1- stats.chi2.cdf(chi_Square_test_statistic,1)
print("P Value: ",p_value)
print("significance level: ",alpha)

P Value:  0.0002375467465819403
significance level:  0.05


### 7. State the results and conclusion¶

In :
if chi_Square_test_statistic>=critical_value:
print("Reject the Null hypothesis H0, as there is relationship between the 2 categorical variables")
else:
print("Retain the Null hypothesis H0, as there is no relationship between the 2 categorical variables")
if p_value<=alpha:
print("Reject the Null hypothesis H0, as there is relationship between the 2 categorical variables")
else:
print("Retain the Null hypothesis H0, as there is no relationship between the 2 categorical variables")

Reject the Null hypothesis H0, as there is relationship between the 2 categorical variables
Reject the Null hypothesis H0, as there is relationship between the 2 categorical variables