In the previous tutorial, we went over the woking of Linear Regression using single independent variable i.e. "x". But in real life scenario, the data is not that simple as we considered in the example above. The dataset might have a lot of different features. We might do feature selection and reduce the number of features but still there will be more than one features.
Hence, to solve this problem, we require a hypothesis equation able of handling and considering more than one independent variable "x". So, how can we represent that ??
Well, by adding more "x's" to the equation. Easy as that.
So, the new hypothesis for Multiple variables can be written in the simple forms as:
So, here in this tutorial, we will be using the very basic eqaution for hypothesis i.e.:
So, what does this equation tells us ?? It tells us that we have two features and one bias term. For each feature we have the coefficients m1 and m2 which are also called as "Weights".
So, let's get started....
numpy: for numerical analysis
pandas: to load and modify the data
matplotlib: to plot the data
# Import Dependencies
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
%matplotlib inline
So, once we are done with this, let's firstly load and visualize our data. Since, we require multiple features, so the Swedish Insurance Dataset that we have been using would not hold well...
So, for this tutorial, we'll be using "Diamonds" dataset where the aim is to predict/estimate the price for the diamond given the input features like it's Cut, Carat, Color, Clarity, Depth etc..
So, let's have a look at our data...
# Load Dataset
df = pd.read_csv('diamonds.csv')
# Let's have a look at our data
print(df.head())
Looks pretty good. Let's check if the data is balanced or not... How do we do that ?? Well, we describe the data...
# Describe the Data
df.describe()
Well, that seems balanced to me. What next ?? Feature selection...
So, we have seen on the top that what all columns our dataset has. But are all the features equally important. Maybe maybe not. So, how do we find that ??
# Pairwise relationship for all Columns in dataset
sns.pairplot(df)
Let's do it...
# Finding Correlation among features
df.corr()
Well, it looks like Carat is giving a pretty good information but wait a minute... Aren't we missing a feature ?? Well if you see the data described on the top, you would realize that we are missing the feature "Cut".
We know that when choosing a diamond, the "Cut" of diamond also plays a major role in deciding its price. But here we have cust given in the form of labels. So, how can we represent them ??
Well, one way is to map these labels to a integer representing their quality i.e. like a Fair diamond can be represented as "1", a Good one by "2" and so on. This provides us a way to quantify the feature "Cut".
So, how can we do this ?? Well one way is to use a dictionary with these values and map all values in the dataframe to these values. Let's do it.
# Create a dictionary mapping the feature "Cut"
d = {'Fair':1, 'Good':2, 'Very Good':3, 'Premium':4, 'Ideal':5}
# Map/Replace all values in dataframe by the values in dictionary
df['cut'].replace(d, inplace=True)
Let's visualize the dataframe once again. Do we see any difference ??
# Print dataframe
df.head()
Well, now we have quantified the feature "Cut" and we have numerical values for them. Let's again check the Correlation.
# Correlation once again
df.corr()
# Load the data in the form to be input to the function for Best Fit Line
X1 = np.array(df['carat'], dtype=np.float64)
X2 = np.array(df['cut'], dtype=np.float64)
y = np.array(df['price'], dtype=np.float64)
As we remmber that for this tutorial we are using Squared error as our cost function. So, for a single variable, our Cost Function was as follows:
where,
So, what will be the chage in the Cost Function for Multiple Variables ???
Yes, you guessed it right. Only our equation for y_pred has changed. So, now the Cost Function would be:
where,
So, let's write the function for the same...
# Cost Function
def cost_Function(m1,m2,b,X1,X2,y):
return sum(((m1*X2 + m2*X2 + b) - y)**2)/(2*float(len(X1)))
What changes would you have in the Gradient Descent Equations ?? You are right once again !!! Earlier we were taking gradients for m and b, now it would be w.r.t m1, m2 and b2 and similarly the updates.
So,
So, let's write the function for this.
# Gradient Descent
# X,y: Input Data Points
# m,b: Initial Slope and Bias
# alpha: Learning Rate
# iters: Number of Iterations for which we need to run Gradient Descent.
def gradientDescent(X1,X2,y,m1,m2,b,alpha,iters):
# Initialize Values of Gradients
gradient_m1 = 0
gradient_m2 = 0
gradient_b = 0
# n: Number of items in a row
a = 0
# Array to store values of error for analysis
hist = []
# Perform Gradient Descent for iters
for _ in range(iters):
# Perform Gradient Descent
gradient_m1 = X1 * ((m1*X1 + m2*X2 + b) - y)
gradient_m2 = X2 * ((m1*X1 + m2*X2 + b) - y)
gradient_b = ((m1*X1 + m2*X2 + b) - y)
m1 = m1 - (alpha*gradient_m1)
m2 = m2 - (alpha*gradient_m2)
b = b - (alpha*gradient_b)
# Calculate the change in error with new values of "m" and "b"
a = cost_Function(m1,m2,b,X1,X2,y)
hist.append(a)
return [m1,m2,b,hist]
Now that we are all done with the functionc required, let's get this thing up and running.
# Main Function
if __name__ == '__main__':
# Learning Rate
# lr = 0.01; iters: 500; error: 492764
# lr = 0.005; iters: 500 ; error: 492757
# lr = 0.0001; iters: 2000; error: 485339
# lr = 0.001; iters = 1500-1600; error: 492736
# lr = 0.002; iters: 100; error: 488696
lr = 0.001
# Initial Values of "m" and "b"
initial_m1 = 0
initial_m2 = 0
initial_b = 0
# Number of Iterations
iterations = 1500
# Check error with initial Values of m and b
print("Initial Error at m1 = {0}, m2 = {1} and b = {2} is error = {3}".format(initial_m1, initial_m2, initial_b, cost_Function(initial_m1, initial_m2, initial_b, X1, X2, y)))
print("Starting gradient descent...")
# Run Gradient Descent to get new values for "m" and "b"
[m1, m2,b,hist] = gradientDescent(X1, X2, y, initial_m1, initial_m2, initial_b, lr, iterations)
# New Values of "m" and "b" after Gradient Descent
print('Values obtained after {0} iterations are m1 = {1}, m2 = {2} and b = {3} and error = {4}'.format(iterations,m1,m2,b,cost_Function(m1,m2, b, X1,X2, y)))
# Calculating y_hat
y_hat = (m1*X1 + m2*X2 + b)
print('y_hat: ',y_hat)
As we can clearly see that we have reduced our initial error over time using Gradient Descent. Let's check how the error decrease using the Cost Function.
# Plot the decreasing cost
fig,ax = plt. subplots()
ax.plot(hist)
ax.set_title('Cost / Error Decay')
So, this shows the working of Linear Regression using Multiple Variables.