In [73]:

```
import pandas as pd
import statsmodels.api as sm
```

In [91]:

```
df = pd.read_csv("Advertising.csv",index_col=0)
```

In [92]:

```
df.head()
```

Out[92]:

In [93]:

```
X = df.iloc[:,:-1]
y = df.iloc[:,-1]
```

In [94]:

```
X.head()
```

Out[94]:

In [95]:

```
y.head()
```

Out[95]:

In [96]:

```
from sklearn.linear_model import LinearRegression
Linear = LinearRegression()
Linear.fit(X,y)
print("Coefficients: ",Linear.coef_)
print("Intercepts: ",Linear.intercept_)
```

In [97]:

```
X = sm.add_constant(X)
```

In [98]:

```
X.head()
```

Out[98]:

In [99]:

```
model = sm.OLS(y,X).fit()
```

Here if you notice, the coefficients and the intercept for the Linear Regression model and the OLS model are the same. So there is no difference between the models there. But the main advantage of the OLS model is it gives us the summary report of the model.

It gives us a comprehensive report on how the model is split, what are the main parameters to look at, what are the different tests that is performed to validate if a feature is necessary or not for model creation. All these important things are covered in this OLS model summary report.

So first, let's start off with the R-Squared metric and see what does it say about the model.

This is the proportion of variance in the dependent variable that is predictable from the independent variable. The main idea here that we need to understand here is, if the Linear Regression model is fit really well, then we'll have a R-Squared value close to 1. There is also a conception that the R-Squared value cannot be negative, but that's not true. If we fail to follow the properties of linear regression then chances are we won't even get a positive R-squared value ever.

Also one more thing that we need to check is the Adjusted R-Squared value. If we keep on adding features to the overall model for a better prediction accuracy then chances are there that the R-Squared value will always be a higher value closer to 1. However to counter this problem **Adjusted R-Squared** is used which penalizes the R-Squared values that include non-useful predictors to the model. So, if we add features which are very little relevance to our model then **Adjusted R-Squared** value will go down. If the **Adjusted R-Squared** is much less than **R-Squared** value, then its a sign that a variable might not be relevant to the overall model. So we just need to find that variable and remove it.

In our case the **R-Squared** value is **0.897** and the **Adjusted R-Squared** is **o.896**. So, its a very well fitted model where we dont have any such feature which is affecting our model performance out of proportion.

So, couple of things to keep in mind is:

- First we need to check the
**R-Squared**to see how close it is to value 1. If its close then our model is fitted well. - Understand if the
**Adjusted R-Squared**and**R-Squared**are very different. If they are close then we have selected relevant features. If they are really apart then there are chances that we have included a feature which is less relevant.

This is used to access the significance of the overall regression model. To understand this metric we need to consider 2 cases:

- Model 1: The model has no features and only intercept, thus called as a Intercept only model
- Model 2: The model has all the features (in our case: TV, radio, newspaper)

Now let's take the Null hypothesis(H0) and the Alternate hypothesis(H1):

H0: The above 2 specified models are equal. H1: Intercept only model(Model 1) is worst than Model 2.

Based on this hypothesis, we will get back a **p-value** which will help us to accept/reject the null hypothesis.

From the summary below, we can see that the **p-value** is close to 0 and the **F-statistic** is really large, we can therefore reject the null hypothesis H0. Therefore there is a clear evidence that **There is a linear relationship between the features TV, radio, newspaper and the target variable--sales.**

So, **F-statistic** greater than 1 or a really large number along with the **p-value** of less than 0.05 signifies that there is a good amount of linear relationship between the target variables and the feature variables.

If we want to see if a particular variable is significant or relevant to the target variable, we perform the **t-test** to check it.
In **t-test**, it checks the relationship between the target variable and every predictor variable independently.

Without considering all the features at once, it takes one feature at a time and checks the relationship between the target variable and predictor variable.

it's basically performes like:

T-test 1 --> Feature 1 (TV) and Target T-test 2 --> Feature 2 and Target T-test 3 --> Feature 3 and Target

Provided the below hypothesis, we will perform the **t-test**

Null Hypothesis(H0) : The coefficient of the Feature value is going to be 0 Alternate Hypothesis(H1): The Feature coefficient value will not be equal to 0.

**Higher value t/lower p-value** signifies that we can reject the null hypothesis and accept the alternate hypothesis.

In our case if we look at the **p-value** columns, we can see that for the **constant** variable and the **TV** and **radio** features we have a **p-value** 0.000. Essentially, all these features have a **p-value** less than 0.05 because we are testing it a a 95% confidence interval. Therefore we **reject the null hypothesis and accept the alternate hypothesis stating that the coefficients of the features are non-zero.**

Now if we take a look at the **newspaper** feature, the **t-value** is very small and the **p-value** is very high which is 0.860. As we know, if the **p-value** is less than 0.05 we reject the null hypothesis. But in our case its greater than 0.05, so we will not be able to reject the null hypothesis in this case. As per the null hypothesis, the coefficient of the feature is going to have a value 0. This means that the feature **newspaper** seems to be irrelevant for our Linear Regression model. **Therefore we can ignore or drop this feature.**

In [100]:

```
model.summary()
```

Out[100]:

In [101]:

```
X.iloc[:,1:].corr()
```

Out[101]:

In [109]:

```
cols = ['const','TV','radio']
Xvar = X[cols]
```

In [110]:

```
Xvar.head()
```

Out[110]:

In [111]:

```
model = sm.OLS(y,Xvar).fit()
```

In [112]:

```
model.summary()
```

Out[112]:

In [113]:

```
Xvar.iloc[:,1:].corr()
```

Out[113]: