Softmax Explained
Understanding the standard activation function for classification.
Problem
Given a bunch of numbers each representing a value for a given item you want to transform them into a metric to identify the highest value with the following properties:
- the resulting metric should normalize all values, that is the sum of all values should be 1
- the metric should favor only one item among the numbers (the one with the highest original value), thus boosting it to make it stand apart more clearly
Use case
Activation function in the last layer of a classification network. Each item stands for a certain class and only one class is to be selected. In the previous layer any kind of activation can result (normally between 0 and infinity) but we want to have a normalized output at the end that tells us which class has been activated.
import numpy as np
original_values = np.random.randn(10)
original_values
step1 = np.exp(original_values)
step1
# check if the ranks in both arrays are still the same (order is preserved)
from numpy.testing import assert_array_equal
assert_array_equal(np.argsort(step1), np.argsort(step1))
# check if all values are positive
assert all(step1 > 0)
step2 = step1/step1.sum()
step2
# check if all values are between 0 and 1
softmax_values = step2
assert all(0 <= softmax_values)
assert all(softmax_values <= 1)
# check if the values sum up to 1
from numpy.testing import assert_almost_equal
assert_almost_equal(softmax_values.sum(), 1)
# Plot the original_values versus the softmax values.
# We sort both arrays in increasing order. You can see that the
# line for the softmax_values is slightly steeper,
# thus indicating the boost of higher values.
import matplotlib.pyplot as plt
with plt.xkcd():
fig, (ax1, ax2) = plt.subplots(2, 1)
ax1.plot(sorted(original_values));
ax1.set(ylabel="Original values");
ax2.plot(sorted(softmax_values));
ax2.set(ylabel="Softmax values");
import keras
from keras import backend as K
keras_result = keras.activations.softmax(
K.variable(value=original_values.reshape(1, -1)), axis=-1).numpy().flatten()
# Since Keras's softmax function uses a different approach,
# the precision of the results varies
from numpy.testing import assert_array_almost_equal
assert_array_almost_equal(keras_result, softmax_values)
import torch
pytorch_result = torch.nn.functional.softmax(
torch.tensor(original_values.reshape(1, -1)), dim=1).numpy().flatten()
pytorch_result
assert_array_almost_equal(pytorch_result, softmax_values)