上QQ阅读APP看书，第一时间看更新

Softmax

The softmax function appears in many artificial intelligence models to normalize data. This is a fundamental function to understand and master. In the case of the warehouse model, an AGV needs to make a probable choice between six locations in the lv vector. However, the total of the lv values exceeds 1. lv requires normalization of the softmax function S. In this sense, the softmax function can be considered as a generalization of the logistic sigmoid function. In the code, lv vector will be named y.

The following code used is SOFTMAX.py; y represents the lv vector in the following source code.

# y is the vector of the scores of the lv vector in the warehouse example:
y = [0.0002, 0.2, 0.9,0.0001,0.4,0.6]

is the exp(i) result of each value in y (lv in the warehouse example), as follows:

y_exp = [math.exp(i) for i in y]

is the sum of iterations, as shown in the following code:

sum_exp_yi = sum(y_exp)

Now, each value of the vector can be normalized in this type of multinomial distribution stabilization by simply applying a pision, as follows:

softmax = [round(i / sum_exp_yi, 3) for i in y_exp]

#Vector to be stabilized [2.0, 1.0, 0.1, 5.0, 6.0, 7.0]
#Stabilized vector [0.004, 0.002, 0.001, 0.089, 0.243, 0.661]

softmax(lv) provides a normalized vector with a sum equal to 1 and is shown in this compressed version of the code. The vector obtained is often described as containing logits.

The following code details the process:

def softmax(x):
    return np.exp(x) / np.sum(np.exp(x), axis=0)

y1 = [0.0002, 0.2, 0.9,0.0001,0.4,0.6]
print("Stablized vector",softmax(y1))
print("sum of vector",sum(softmax(y1)))
# Stabilized vector [ 0.11119203 0.13578309 0.27343357 0.11118091 0.16584584 0.20256457]
# sum of vector 1.0

The softmax function can be used as the output of a classifier (pixels for example) or to make a decision. In this warehouse case, it transforms lv into a decision-making process.

The last part of the softmax function requires softmax(lv) to be rounded to 0 or 1. The higher the value in softmax(lv), the more probable it will be. In clear-cut transformations, the highest value will be close to 1 and the others will be closer to 0. In a decision-making process, the highest value needs to be found, as follows:

print("highest value in transformed y vector",max(softmax(y1)))
#highest value in normalized y vector 0.273433565194

Once line 3 (value 0.273) has been chosen as the most probable location, it is set to 1 and the other, lower values are set to 0. This is called a one-hot function. This one-hot function is extremely helpful to encode the data provided. The vector obtained can now be applied to the reward matrix. The value 1 probability will become 100 in the R reward matrix, as follows.

The softmax function is now complete. Location l₃ or C is the best solution for the AGV. The probability value is multiplied by 100 in the R function and the reward matrix described can now receive the input.

Before continuing, take some time to play around with the values in the source code and run it to become familiar with softmax.

We now have the data for the reward matrix. The best way to understand the mathematical aspect of the project is to go to a paperboard and draw the result using the actual warehouse layout from locations A to F.

Locations={l₁-A, l₂-B, l₃-C, l₄-D, l₅-E, l₆-F}

Value of locations in the reward matrix={0,0,100,0,0,0} where C (the third value) is now the target for the self-driving vehicle, in this case, an AGV in a warehouse.

We obtain the following reward matrix R described in the first chapter.

This reward matrix is exactly the one used in the Python reinforcement learning program using the Q function in the first chapter. The output of this chapter is the input of the R matrix in the first chapter. The 0 values are there for the agent to avoid those values. This program is designed to stay close to probability standards with positive values, as shown in the following R matrix.

R = ql.matrix([ [0,0,0,0,1,0],
                [0,0,0,1,0,1],
                [0,0,100,1,0,0],
                [0,1,1,0,1,0],
                [1,0,0,1,0,0],
                [0,1,0,0,0,0] ])

At this point, the building blocks are in place to begin evaluating the results of the reinforcement learning program.