In this tutorial, we will be raising our standard a bit asince, we understand the basics now. So, let's have a look at Google's Tensorflow and how it can do Linear Regression.
You will be surprised to see that how easy it is to use this tool for anything.
So, let's get started...
numpy: for numerical analysis
pandas: to read and modify data
tensorflow: Google's library for Machine Learning and Deep Learning
Matplotlib: to plot the data
# Import Dependencies
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
%matplotlib inline
So, like in all the previous tutorials, the step-2 in this remains the same i.e. Load the Data. Let's do it.
For this tutorial, I'll be using the Swedish Insurance dataset to show the working or Linear Regression in Tensorflow.
# Load Dataset
df = pd.read_csv('dataset/Insurance-dataset.csv')
# Let's have a look at the data
df.head()
# Checking if the dataset is balanced or not
df.describe()
It's all ok. So, what's the next step ?? Well, let's seperate the features and labels.
# Training Data
train_X = np.array(df['X'], dtype=np.float64)
train_y = np.array(df['Y'], dtype=np.float64)
# Let's also define other Training Parameters
learningRate = 0.001 # Learning Rate
iterations = 5000 # Number of Training Iterations
step_size = 50 # Will be used to display log in steps of 50
n = train_X.shape[0] # Number of Training Samples in a column.
# Assigning Placeholders for Input Data
# A Placeholder is something in which we input our data at run time i.e. when we run our Tensorflow Session.
# We will be inputting the train_X and train_y data at run time i.e. when we require it for computation.
X = tf.placeholder('float')
Y = tf.placeholder('float')
Next step is to initialize "Weight" i.e. "m" and "Bias" i.e. "b".
In Tensorflow, when we need to define a variable that has an initial value and we will train during the session to change that value, we define it as a "Variable".
We know that the weights and bias are initialized to either zero or any random value, so we will do this in this tutorial.
According to Tensorflow:
A variable maintains state in the graph across calls to run(). You add a variable to the graph by constructing an instance of the class Variable. The Variable() constructor requires an initial value for the variable, which can be a Tensor of any type and shape. The initial value defines the type and shape of the variable. After construction, the type and shape of the variable are fixed. The value can be changed using one of the assign methods.
# Define Trainable Variables
m = tf.Variable(np.random.randn(), name='weight')
b = tf.Variable(np.random.randn(), name='bias')
Well, now that we have defined all the important components, let's write the equation for the "Hypothesis" or "Prediction".
# Hypothesis
# y_hat = m*X + b
y_hat = tf.add(tf.multiply(m,X),b)
# Cost Function
# J = (y_hat - y) ^ 2 / (2*n)
def costFunction(y_hat, y, samples):
return tf.reduce_sum(tf.pow(y_hat-y,2))/(2*samples)
So, next step is to define Gradient Descent. Let's do it.
# Gradient Descent Optimizer
# Inputs: Learning Rate and Cost from Cost Function
# Aim: To minimize Cost Function
# lr: Learning Rate
# cost: Cost Function Output [J]
def gradientDescent(lr, cost):
return tf.train.GradientDescentOptimizer(learning_rate=lr).minimize(cost)
Well, now that we are all done, let's call our functions and do the training.
# Mean Squared Error / Cost Function
J = costFunction(y_hat, Y, n)
# Gradient Descent Optimizer
gd = gradientDescent(lr=learningRate, cost=J)
Now that we have calles our functions, let's initialize our Tensorflow session and run it.
# Initialize all Variables
init = tf.global_variables_initializer()
# Launch TF Graph Session
with tf.Session() as sess:
# Initialize Session
sess.run(init)
print('Starting Gradient Descent...\n')
# Run Gradient Descent and Minimize Cost
for iters in range(iterations):
# Do Gradient Descent to find out best values for "m" and "b"
sess.run(gd, feed_dict={X:train_X, Y:train_y})
# Print on steps to keep a tract of Cost with each Iteration Step
if (iters + 1) % step_size == 0:
# Mapping training Values to Placeholders and Calculate Cost
c = sess.run(J, feed_dict={X:train_X, Y:train_y})
print('Step: ', '%04d' % (iters + 1), 'cost: ', '{:.9f}'.format(c))
print('\n Finished Optimization...')
training_cost = sess.run(J, feed_dict={X: train_X, Y: train_y})
print("\n Training Cost:", training_cost , " m:", sess.run(m), " b:", sess.run(b), '\n')
# Plot the Best Fit Line
fig,ax = plt.subplots(figsize=(10,8))
ax.plot(train_X, train_y, 'ro', label='Original data')
ax.plot(train_X, sess.run(m) * train_X + sess.run(b), c='g', label='Fitted line')
ax.set_title('Linear Regression: Tensorflow')
plt.legend(loc='best')