TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks. It is currently used for both research and production by different teams in many commercial Google products, such as speech recognition, Gmail, Google Photos, and search. TensorFlow was originally developed by the Google Brain team for Google's research and production purposes and later released under the Apache 2.0 open source license on November 9, 2015.
TensorFlow is very new and changing rapidly.
import tensorflow as tf
print("Tensor Flow Version: {}".format(tf.__version__))
TensorFlow also has some competition. The biggest competitor to TensorFlow/Keras is PyTorch. Listed below are some of the more popular deep learning libraries actively being supported:
Other deep learning tools:
TensorFlow is a low-level mathematics API, similar to Numpy. However, unlike Numpy, TensorFlow is built for deep learning. TensorFlow compiles these compute graphs into highly efficient C++/CUDA code.
TensorFlow is a library for linear algebra. Keras is a higher-level abstraction for neural networks that you build upon TensorFlow. Here, we will do some basic linear algebra that employs TensorFlow directly and does not make use of Keras. First, we will multiply a row and column matrix.
# Create a Constant op that produces a 1x2 matrix.
# The op is added as a node to the default graph.
#
# The value returned by the constructor represents
# the output of the Constant op.
matrix1 = tf.constant([[3.0, 3.0]])
# Create another Constant that produces a 2x1 matrix.
matrix2 = tf.constant([[2.0],[2.0]])
# Create a Matmul op that takes 'matrix1' and 'matrix2' as inputs.
# The returned value, 'product', represents the result of the matrix
# multiplication.
product = tf.matmul(matrix1, matrix2)
print(f'Martix1 shape: {matrix1.shape}')
print(f'Martix2 shape: {matrix2.shape}')
print('\n',product)
print('\nProduct =',float(product))
Here, we will see how to subtract a constant from a variable.
x = tf.Variable([1.0, 2.0])
a = tf.constant([3.0, 3.0])
# Add an op to subtract 'a' from 'x'.
sub = tf.subtract(x, a)
print(sub)
print(sub.numpy())
Variables are only useful if their values can be changed. This can be accomplished by calling the assign function.
print(x)
x.assign([4.0, 6.0])
print('\n',x)
Now we can perform the subtraction with this new value.
sub = tf.subtract(x, a)
print(sub)
print(sub.numpy())
Keras is a layer on top of Tensorflow that makes it much easier to create neural networks. Rather than define the graphs, as you see above, you set the individual layers of the network with a much more high-level API. Unless you are performing research into entirely new structures of deep neural networks, it is unlikely that you need to program TensorFlow directly.
This example shows how to encode the MPG dataset for regression. This dataset takes some preprocessing because:
Predictors/Inputs:
missing_median
.encode_text_dummy
.encode_numeric_zscore
.Output:
encode_text_index
.to_xy
.To encode categorical values that are part of the feature vector, use the functions from above if the categorical value is the target.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
import pandas as pd
import io
import os
import requests
import numpy as np
from sklearn import metrics
url = "https://data.heatonresearch.com/data/t81-558/auto-mpg.csv"
df = pd.read_csv(url, na_values=['NA', '?'])
df.info()
df.head()
cars = df['name']
# Handle missing value
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())
# Pandas to Numpy
x = df.drop(columns=['name','mpg']).values
# regression target
y = df['mpg'].values
# Build the neural network
model = Sequential()
model.add(Dense(25, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x,y,verbose=1,epochs=100)
In the above code, the neural network contains $4$ layers. The first layer is the input layer because it contains the input_dim
parameter that sets the number of inputs that the dataset has. The network needs one input neuron for every column in the data set (including dummy variables).
There are also $2$ hidden layers, with $25$ and $10$ neurons each. You might be wondering to chose these numbers. Selecting a hidden neuron structure is one of the most common questions about neural networks. Unfortunately, there is no right answer. These are hyperparameters. They are settings that can affect neural network performance, yet there is no clearly defined means of setting them.
In general, more hidden neurons means more capability to fit complex problems. However, too many neurons can lead to overfitting and lengthy training times. Too few can lead to underfitting the problem and will sacrifice accuracy. Also, how many layers you have is another hyperparameter. In general, more layers allow the neural network to be able to perform more of its feature engineering and data preprocessing. But this also comes at the expense of training times and the risk of overfitting. In general, you will see that neuron counts start larger near the input layer and tend to shrink towards the output layer in a sort of triangular fashion.
Some techniques use machine learning to optimize these values.
The program produces one line of output for each training epoch. You can eliminate this output by setting the verbose setting of the fit command:
Next, we will perform actual predictions. We assign these predictions to the pred
variable. These are all MPG predictions from the neural network. Notice that this is a 2D array? You can always see the dimensions of what Keras returns by printing out pred.shape. Neural networks can return multiple values, so the result is always an array. Here the neural network only returns one value per prediction (there are $398$ cars, so $398$ predictions). However, a 2D range is needed because the neural network has the potential of returning more than one value.
pred = model.predict(x)
print(f"Shape: {pred.shape}")
print(pred[0:10])
We would like to see how good these predictions are. We know what the correct MPG is for each car, so we can measure how close the neural network was.
# Measure RMSE error. RMSE is common for regression.
score = np.sqrt(metrics.mean_squared_error(pred,y))
print(f"RMSE: {score}")
The number printed above is the average amount that the predictions were above or below the expected output. We can also print out the first ten cars, with predictions and actual MPG.
# Sample predictions
for i in range(10):
print(f"{i+1}. Car name: {cars[i]}, MPG: {y[i]}, predicted MPG: {pred[i]}")
Classification is the process by which a neural network attempts to classify the input into one or more classes. The simplest way of evaluating a classification network is to track the percentage of training set items that were classified incorrectly. We typically score human results in this manner. For example, you might have taken multiple-choice exams in which you had choices A, B, C, or D. If you chose the wrong letter on a $10$-question exam, you would earn a $90\%$. In the same way, we can grade a algorithm; however, most classification algorithms do not merely choose A, B, C, or D. Computers typically report a classification as their percent confidence or an approximated probability of class membership. The figure below shows how a computer and a human might both respond to question number $1$ on an exam.
As you can see, the human test taker marked the first question as "B." However, the algorithm test taker had an $80\%$ $(0.8)$ confidence in "B" and was also somewhat sure with $10\%$ $(0.1)$ on "A." The computer then distributed the remaining points on the other two. In the simplest sense, the machine would get $80\%$ of the score for this question if the correct answer were "B." The computer would get only $5\%$ $(0.05)$ of the points if the correct answer were "D."
What we just saw is a straightforward example of how to perform classification using TensorFlow. Now the Iris dataset example.
url = "https://data.heatonresearch.com/data/t81-558/iris.csv"
df = pd.read_csv(url, na_values=['NA', '?'])
df.info()
df.head()
x = df.drop('species', axis=1).values
dummies = pd.get_dummies(df['species'])
species = dummies.columns
y = dummies.values
print(f'Secies {species}')
print(f'\nTarget:')
print(y[:5])
# Build neural network
model = Sequential()
model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y.shape[1],activation='softmax')) # Output
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(x,y,verbose=1,epochs=100)
Now we have a trained neural network, we would like to be able to use it. The following code makes use of our neural network. Exactly like before, we will generate predictions. Notice that three values come back for each of the $150$ iris flowers. There were three types of iris (Iris-setosa, Iris-versicolor, and Iris-virginica).
pred = model.predict(x)
print(f"Shape: {pred.shape}")
print(pred[0:10])
If you would like to turn off scientific notation, the following line can be used:
np.set_printoptions(suppress=True)
Now we see these values rounded up.
print(pred[0:5])
Usually, the model considers the column with the highest prediction to be the prediction of the neural network. It is easy to convert the predictions to the expected iris species. The argmax
function finds the index of the maximum prediction for each row.
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y,axis=1)
print(f"Predictions: {predict_classes}")
print(f"\nExpected: {expected_classes}")
It is straightforward to turn these indexes back into iris species. We use the species list that we created earlier.
print(species[predict_classes[1:10]])
Here we score with accuracy. It is essentially a test score. For all of the iris predictions, what percent were correct? The downside is it does not consider how confident the neural network was in each prediction.
correct = metrics.accuracy_score(expected_classes,predict_classes)
print(f"Accuracy: {correct}")
The code below performs two ad hoc predictions. The first prediction is simply a single iris flower, and the second predicts two iris flowers. Notice that the argmax in the second prediction requires axis=1? Since we have a 2D array now, we must specify which axis to take the argmax over. The value axis=1 specifies we want the max column index for each row.
One sample flower
sample_flower = np.array( [[5.0,3.0,4.0,2.0]], dtype=float)
pred = model.predict(sample_flower)
print(f'Predictions: \n\n{pred}')
pred = np.argmax(pred)
print(f"\nPredict that {sample_flower} is: {species[pred]}")
Two sample flowers
sample_flower = np.array( [[5.0,3.0,4.0,2.0],[5.2,3.5,1.5,0.8]], dtype=float)
pred = model.predict(sample_flower)
print(f'Predictions: \n\n{pred}')
pred = np.argmax(pred,axis=1)
print(f"\nPredict that these two flowers: \n\n{sample_flower} \n\nare: {list(species[pred])}")
Complex neural networks will take a long time to fit/train. It is helpful to be able to save these neural networks so that they can be reloaded later. A reloaded neural network will not require retraining. Keras provides three formats for neural network saving.
Usually you will want to save in HDF5.
save_path = "." # save to current directory
# save neural network structure to JSON (no weights)
model_json = model.to_json()
with open(os.path.join(save_path,"network.json"), "w") as json_file:
json_file.write(model_json)
# save neural network structure to YAML (no weights)
model_yaml = model.to_yaml()
with open(os.path.join(save_path,"network.yaml"), "w") as yaml_file:
yaml_file.write(model_yaml)
# save entire network to HDF5 (save everything, suggested)
model.save(os.path.join(save_path,"network.h5"))
The code below sets up a neural network and reads the data (for predictions), but it does not clear the model directory or fit the neural network. The weights from the previous fit are used.
Now we reload the network and perform another prediction. The Accuracy should match the previous one exactly if the neural network was really saved and reloaded.
from tensorflow.keras.models import load_model
saved_model = load_model(os.path.join(save_path,"network.h5"))
pred_saved = saved_model.predict(x)
predict_classes = np.argmax(pred_saved,axis=1)
correct_saved = metrics.accuracy_score(expected_classes,predict_classes)
print(f"Saved Model Accuracy: {correct_saved}")
print(f"Before Saving Accuracy: {correct}")