Predicting Boston Housing Prices using Tensorflow

In this tutorial, we will:

Explore the Boston Housing Dataset like what it looks like, what are the features available and what we need to predict.
Implement a Simple Linear Regressor using Tensorflow and see how well the regressor performs on this data using the decrease in the Cost/Loss Function depicted using a plot w.r.t Epochs and other metrics.
Implement a LinearRegressor and a DNNRegressor using Tensorflow Estimator API and see how easy it is to implement a regressor using Tensorflow Estimator API.

Requirements

OS: Ubuntu/AWS Cloud/Google Cloud/Windows
Python 3+
Tensorflow
Numpy [+ mkl for Windows]

Just show me the Code !!

The code for this tutorial is available as a iPython notebook here.

So, let's get started.

Exploring Boston Housing Price Dataset

Load Data and Feature Intuition

The first step is to load the dataset and do any preprocessing if necessary. To load the dataset, I'll be using scikit-learn as it contains this dataset which contains the description [DESCR] of each feature, data i.e. the feature values and finally the target i.e. the labels.

NOTE: If you want to download a dataset from any website, then you might required to do some preprocessing, but I am not covering that step here.

Once we have downloaded the dataset, let's use pandas library and get some insight about the dataset like how well the features correlate with the output i.e. the prices, how much the features correlate with each other and whether a change in the features has a positive or a negative impact on the output. Also, an important thing to notice would be that if the dataset is balanced or not.

The above code gives us a lot of useful information about the dataset. It tells us the important features that we must look at while implementing the algorithms for training on this dataset.

Data Preprocessing and Train-Test Split

Since, now we know that all the features that are important and how they correlate with each other, we can just go on and implement the algorithm to train it, can't we !! Well......not just yet.

Before sending the data to the model, an important step is data preprocessing and then cross-validation. Since, we have the dataset that has values for each feature which lie in different ranges i.e. some of those are in hundreds and some of them are in decimal points etc. So, we need to normalize them so that they all lie on the same scale. This way we make sure that the small values are not entirely neglected and the larger values get more importance.

Also, we cannot send the whole data as it is to train the model as this can lead to Overfitting i.e. since the dataset is small, the model can try to learn the dataset and fit to that. This leads to a very good performance on the training data but when we try it on an external data, the model performs pretty bad.

One more thing to consider is that the preprocessing will be applied to the training and test features only and not the labels as we need the output in the range of the labels.

So, let's get started.

Now that we are done with the hard part, it's time to define our Regressor models and see that how well they perform for predicting the house prices.

Linear Regression using Tensorflow

In this part, we will implement the Linear Regression model using pure tensorflow and no other wrappers around it. We will define our placeholders, variables and train the model to reduce the cost function using a Optimizer function.

So, let's get started.

So, we just trained a basic tensorflow model to predict the housing prices.

Tensorflow Linear Regressor using Estimator API

In this part, we'll leverage the same preprocessed dataset but with a different approach. In this, we'll be using a Linear Regressor model from Tensorflow's Estimator API.

So, we just trained a Linear Regressor model using tensorflow's Estimator API to predict the housing prices.

NOTE that the aim of this tutorial is to get you acquainted with the Estimator API and show that how it works. The current loss using this API for this example is a bit high and this model can be tuned further to reduce that error and get better predictions.

Tensorflow DNN Regressor using Estimator API

In this part, we'll leverage the same preprocessed dataset but with a different approach. In this, we'll be using a DNN Regressor model from Tensorflow's Estimator API.

So, we just trained a DNN Regressor model using tensorflow's Estimator API to predict the housing prices.

So, we have reached the end of this tutorial. I hope I was able to make you understand that how we can use Tensorflow and the Estimator API for a Regression task.

In the next tutorial, we'll go over on how to use Tensorflow and the Estimator API for Classification Task.

For more projects and code, follow me on Github

Please feel free to leave any comments, suggestions, corrections if any :)