Hello Everyone. In this tutorial, we will implement Logistic Regression from scratch.
So, let's get started.
# Import Dependencies
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
%matplotlib inline
Next step is to define our hypothesis. As we saw in the theory part that the hypothesis for Logistic Regression is same as the Linear Regression but with only one minor difference i.e. in Logistic Regression we take the Sigmoid of the hypothesis before going further with it.
So, let's define these things first.
# Sigmoid Function
# sigmoid(x) = 1 / (1 + exp(-x))
def sigmoid(z):
return float(1.0 / float(1.0 + np.exp(-1.0 * z)))
# Hypothesis Function
# y_hat = m*X
# Hypothesis = sigmoid(y_hat) => sigmoid(m*X)
def hypothesis(m, X):
z = 0
for i in range(len(m)):
z += X[i] * m[i]
return sigmoid(z)
So, if we see the theory for Logistic Regression and look at the mathematics in that, our next step seems to be forming a Cost Function for Logistic Regression.
So, let's write our Cost Function.
In theory, we defined our final Cost Function expression as follows:
where,
Also, we know that according to the value of the class i.e. 0 or 1, the cost function reduces to:
So, let's write our cost function using these equations.
# Cost Function
# J = (-1/n) [y log(sigmoid(mX + b)) + (1 - y) log(1 - sigmoid(mX + b))]
def costFunction(X,y,m):
errorSum = 0
error = 0
n = len(y)
for i in range(n):
hy = hypothesis(m,X[i])
if y[i] == 1:
error = y[i] * np.log(hy)
elif y[i] == 0:
error = (1-y[i]) * np.log(1 - hy)
errorSum += error
J = (-1/n) * errorSum
return J
So, now that we have defined our Cost Function, it's time to define our Gradient Descent Function. So, let's do it.
How does Gradient Descend works ?? We find the derivative of the Cost Function w.r.t "m" i.e. dJ/dm and then calculate the error. After this we update the value of m as:
where,
So, what is dJ/dm for this case ??
Let's calculate it.
where,
So, the equation becomes:
and
So, let's go step by step.
or
Now the seocnd part of this equation.
So, this can be written as:
which can be written as:
Putting all these values in the equation of Cost Function, eqn (i), we get:
which can be written as:
Now, the second term in the equation (v) can be simplified as:
using the quality of "log" equation i.e.
Putting this equation to equation (v), we get:
Now, we have the final cost function. Let's calculate the derivative of this i.e. dJ/dm
Taking first term first:
From (viii) "a" and "b" we get the following final equation:
So, to update the value of "m", we get the equation:
Let's implement this function...
# Gradient Descent
# X,y: Features,Labels
# m: Slope
# lr: Learning Rate
def gradientDescend(X,y,m,lr):
new_m = []
n = len(y)
const = lr/n
for j in range(len(m)):
errorSum = 0
for i in range(n):
Xi = X[i]
Xij = Xi[j]
hi = hypothesis(m, X[i])
error = (hi - y[i]) * Xij
errorSum += error
n = len(y)
const = float(lr) / float(n)
J = const * errorSum
updated_m = m[j] - J
new_m.append(updated_m)
return new_m
Now that we are done with our Gradient Descend function, one last thig is left. This function currently will run only once whereas we know that to minimize the loss/cost, gradient descend requires more than one step down the slope.
So, let's define that last function.
# Runner Function
# X: Features
# y: Labels
# lr: Learning Rate
# m: Slope
# iters: Number of Iterations
def runner(X,y,lr,m,iters):
n = len(y)
a = 0
hist = []
print('Starting Gradient Descend...\n')
for x in range(iters):
new_m = gradientDescend(X,y,m,lr)
m = new_m
a = costFunction(X,y,m)
hist.append(a)
# Print the information at every 500th step
if x % 500 == 0:
costFunction(X,y,m)
print('After {} Iterations...'.format(x))
print('m: ', m)
print('Cost ', costFunction(X,y,m))
print('\n')
return [m,hist]
So, now that we are all done with our functions, let's load the data and finally test what all hardwork we have done till yet. Hold on just a bit longer...
For this tutorial, we will be using the "Haberman's Survival Dataset". You can download this dataset from here [https://archive.ics.uci.edu/ml/datasets/Haberman's+Survival].
So, what is this dataset all about. Well, this dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer.
This dataset has the following attributes:
So, as you can see that this dataset classifies that whether a person survived more than 5 years or less than that based on his age, year of operation and the number of positive axillary nodes detected.
Let's load the dataset and have a look at it.
# Load the Dataset
df = pd.read_csv('dataset/haberman-data.csv')
# Let's have a look at it
df.head()
So, it looks like our data does not have any labels. let's add labels to the columns.
# Let's give names to the Columns
df.columns = ['age','year_operation','pos_auxillary_nodes','survival_status']
# Let's check the data again
df.head()
Well, it looks good. All columns have a name now.
Let's describe it as well to check if its balaced or not.
# Describe the data
df.describe()
Dataset looks pretty balanced to me. Now, to the nest step. Let's do feature selection.
So, how do we do this ?? Correlation. Yes, you are right. Let's do it.
# Correlation
df.corr()
Since, this dataset has less number of features and from the correlation report all of them look important. So, let's take them all.
# Features:
X = np.array(df[['age','year_operation','pos_auxillary_nodes']])
# Printing first 10 features
print('Features: ',X[:10])
# Preprocessing data to normalize data points and bring to same scale
# using MinMaxScala brings all data point to range between -1 and 1
min_max_scaler = MinMaxScaler(feature_range=(-1,1))
X = min_max_scaler.fit_transform(X)
# Printing First 10 Processed Features
print('Preprocessed Features: ',X[:10])
# Labels
y = np.array(df['survival_status'])
print('Labels: ',y)
Now, since, we have three features, we require three slopes. One for each feature.
So, the equation is like:
Here, we will define the slope as an array and refer to them as follows:
Similarly for the values of X:
# Training Parameters
# Initial Slopes [m1,m2,m3]
initial_m = [0,0,0]
# Learning Rate
learning_Rate = 0.01
# Number of Iterations
iterations = 2000
# Initial Cost
print('Initial Cost with m1 = {0}, m2 = {1} and m3 = {2} is Cost = {3}'.format(initial_m[0],initial_m[1],initial_m[2],costFunction(X,y,initial_m)))
So, now that we have defined all the inputs, defined all the functions, it's time to test our classifier. So, let's run it.
# Run the Classifier
[m,hist] = runner(X,y,learning_Rate,initial_m,iterations)
# Final Cost
print('Value of m after {0} iterations is m1 = {1}, m2 = {2}, m3 = {3}'.format(iterations,m[0],m[1],m[2]))
So, we can clearly see that the Cost decreases and after 2000 iterations we have the values for our three slopes i.e. m1, m2 and m3.
Let's plot the Cost Function for this.
# Plot Cost Function Decay
fig,ax = plt.subplots(figsize=(10,8))
ax.plot(hist)
ax.set_title('Cost Decay')
The cost function plot shows that how it decays and becomes stable after 2000 iterations.
Well, this ends our code for Logistic Regression using Gradient Descend.