Creating Feature and Target Matrix

Before creating any model, the first and foremost thing that we generally do is create the feature and target matrix. Let’s see how we will do that.

Before that, let’s understand our dataset which was taken from Kaggle:

Also refer this post to see how we implement an algorithm after selecting the Feature and Target Matrix.

Context

  • A new coronavirus designated 2019-nCoV was first identified in Wuhan, the capital of China’s Hubei province
  • People developed pneumonia without a clear cause and for which existing vaccines or treatments were not effective.
  • The virus has shown evidence of human-to-human transmission
  • Transmission rate (rate of infection) appeared to escalate in mid-January 2020
  • As of 30 January 2020, approximately 8,243 cases have been confirmed

Content

Each row contains report from each region/location for each day
Each column represents the number of cases reported from each country/region

Now that we have known about what the dataset about, let’s dive straight into it and see how it looks like:

1. Load the dataset

import pandas as pd
df = pd.read_csv("covid_19_clean_complete.csv")
df.head()

2. Extracting the all the rows which have Country/Region colum value as Afganistan

df[df['Country/Region']=='Afghanistan'].head()

3. Displaying the column names

df.columns
O/P: Index(['Province/State', 'Country/Region', 'Lat', 'Long', 'Date', 'Confirmed', 'Deaths', 'Recovered'], dtype='object')

4.Displaying the total number of rows in the dataset

df.index
O/P: RangeIndex(start=0, stop=21484, step=1)

5. Setting the index of the dataframe to ‘Country/Region’

df.set_index('Country/Region',inplace=True)
df.head()

6. Displaying the dataframe which has index set to ‘Country/Region’ and Province/State is not null

df[df['Province/State'].notnull()].head()

7 Resetting the Index

df.reset_index(inplace=True)
df.head()
df[df['Province/State'].notnull()].head()

8. Retrieving the first 5 rows and first 3 columns

df.iloc[0:5,0:3]

9. Retrieving the first 20 rows of Country/Region, Province/State, Lat, Long

df1 = df.loc[0:20,['Country/Region','Province/State','Lat','Long']]
df1

10. Retrieving the rows from the new dataframe df1 where Province/State is not null

df1[df1['Province/State'].notnull()].head()

11. Creating the variable X to store the Independent Features and deleting the unnecessary columns and the dependent feature

X = df.drop('Recovered',axis=1)
X.head()

12.Printing the shape of X

X.shape
O/P: (21484, 7)

13. creating variable y to store the label/dependent variable.

y = df['Recovered']
y.head()

14. Printing the shape of variable y

y.shape
O/P: (21484,)