Chi Squared Test

  1. Define Null and Alternate Hypothesis
  2. State Alpha
  3. Calculate Degree of Freedom
  4. State Decision rule
  5. Calculate the chi-square test statistic
  6. Calculate the critical value
  7. State Results and conclusion
In [82]:
import scipy.stats as stats
import pandas as pd 
import numpy as np
In [83]:
df = pd.read_csv("Placement.csv")
In [84]:
df.head()
Out[84]:
sl_no gender ssc_p ssc_b hsc_p hsc_b hsc_s degree_p degree_t workex etest_p specialisation mba_p status salary
0 1 M 67.00 Others 91.00 Others Commerce 58.00 Sci&Tech No 55.0 Mkt&HR 58.80 Placed 270000.0
1 2 M 79.33 Central 78.33 Others Science 77.48 Sci&Tech Yes 86.5 Mkt&Fin 66.28 Placed 200000.0
2 3 M 65.00 Central 68.00 Central Arts 64.00 Comm&Mgmt No 75.0 Mkt&Fin 57.80 Placed 250000.0
3 4 M 56.00 Central 52.00 Central Science 52.00 Sci&Tech No 66.0 Mkt&HR 59.43 Not Placed NaN
4 5 M 85.80 Central 73.60 Central Commerce 73.30 Comm&Mgmt No 96.8 Mkt&Fin 55.50 Placed 425000.0

1. Define Null and Alternate hypothesis

Null Hypothesis (H0) = There is no relationship between the 2 categorical variables

Alternate Hypothesis (H1) = There is a relationship between the 2 categorical variables

In [85]:
df_crosstab = pd.crosstab(df['specialisation'],df['status'])
In [86]:
df_crosstab
Out[86]:
status Not Placed Placed
specialisation
Mkt&Fin 25 95
Mkt&HR 42 53
In [87]:
df_crosstab.values
Out[87]:
array([[25, 95],
       [42, 53]], dtype=int64)
In [88]:
observed_values = df_crosstab.values
print("Observed values: ",observed_values)
Observed values:  [[25 95]
 [42 53]]
In [89]:
test_dependence = stats.chi2_contingency(observed_values)
In [90]:
test_dependence
Out[90]:
(12.440229009203623,
 0.00042018425858864284,
 1,
 array([[37.39534884, 82.60465116],
        [29.60465116, 65.39534884]]))
In [91]:
Expected_value = test_dependence[3]
In [92]:
print("Expected value: ",Expected_value)
Expected value:  [[37.39534884 82.60465116]
 [29.60465116 65.39534884]]

2. State Alpha

In [93]:
alpha = 0.05

3. State the degree of freedom

In [94]:
rows = len(df_crosstab.iloc[0:2,0])
columns = len(df_crosstab.iloc[0,0:2])
degree_of_freedom = (rows-1)*(columns-1)
print("Degree of Freedom: ",degree_of_freedom)
Degree of Freedom:  1

4. State the decision rule

  1. If the chi-square statistic is greater than or equal to the critical value, then reject the null hypothesis.
  2. If the P-value is less than or equal to alpha, reject the null hypothesis

5. Calculate the Chi-square test statistic

In [95]:
from scipy.stats import chi2
test_statistic = sum([(O-E)**2/E for O,E in zip(observed_values,Expected_value)])
chi_Square_test_statistic = test_statistic[0]+test_statistic[1]
In [96]:
print("The value of chi-squared test statistic: ",chi_Square_test_statistic)
The value of chi-squared test statistic:  13.508014470676486

6. Calculate the critical value

In [97]:
critical_value = stats.chi2.ppf(q = 1-alpha,df = degree_of_freedom)
print("critical value: ",critical_value)
critical value:  3.841458820694124

Alternate Method using p-value

In [101]:
p_value = 1- stats.chi2.cdf(chi_Square_test_statistic,1)
print("P Value: ",p_value)
print("significance level: ",alpha)
P Value:  0.0002375467465819403
significance level:  0.05

7. State the results and conclusion

In [102]:
if chi_Square_test_statistic>=critical_value:
    print("Reject the Null hypothesis H0, as there is relationship between the 2 categorical variables")
else:
    print("Retain the Null hypothesis H0, as there is no relationship between the 2 categorical variables")
if p_value<=alpha:
    print("Reject the Null hypothesis H0, as there is relationship between the 2 categorical variables")
else:
    print("Retain the Null hypothesis H0, as there is no relationship between the 2 categorical variables")
Reject the Null hypothesis H0, as there is relationship between the 2 categorical variables
Reject the Null hypothesis H0, as there is relationship between the 2 categorical variables