In Chapter 9, a two-layer neural network was used to verify the universal approximation theorem. It was written in hard code at the time, but now we use a mini framework to build it.
This model is very simple - a two-layer neural network. The first layer is followed by a Sigmoid activation function, and the second layer directly outputs the fitted data, as shown in Figure 14-2.
Figure 14-2 The abstract model for the fitting task
def model():
dataReader = LoadData()
num_input = 1
num_hidden1 = 4
num_output = 1
max_epoch = 10000
batch_size = 10
learning_rate = 0.5
params = HyperParameters_4_0(
learning_rate, max_epoch, batch_size,
net_type=NetType.Fitting,
init_method=InitialMethod.Xavier,
stopper=Stopper(StopCondition.StopLoss, 0.001))
net = NeuralNet_4_0(params, "Level1_CurveFittingNet")
fc1 = FcLayer_1_0(num_input, num_hidden1, params)
net.add_layer(fc1, "fc1")
sigmoid1 = ActivationLayer(Sigmoid())
net.add_layer(sigmoid1, "sigmoid1")
fc2 = FcLayer_1_0(num_hidden1, num_output, params)
net.add_layer(fc2, "fc2")
net.train(dataReader, checkpoint=100, need_test=True)
net.ShowLossHistory()
ShowResult(net, dataReader)
Hyperparameter description:
- 1 neuron in the input layer, because there is only one
x
value - 4 neurons in the hidden layer suffices for this problem as there are very few features
- 1 neuron in the output layer, because it is a fitting task
- Learning rate = 0.5
- Maximum
epoch = 10000
rounds - Number of batch samples = 10
- Fitting network type
- Xavier initialization
- Absolute loss stop condition = 0.001
Figure 14-3 Changes in the loss function value and accuracy while training
As shown in Figure 14-3, the loss function value begins to drop sharply after a period of flatness. This phenomenon is common in neural network training. A flat zone with gradient changes at that time is a likely reason explaining this behaviour. It was difficult to see the upcoming road, and then it suddenly found it. This situation also brings a drawback: we will often encounter gentle slopes, but should we continue training? Can you find a way out if you hold on for a while? Or is the model capacity insufficient and it will never find a way out? There is no accurate answer to this question. It can only rely on experimentation and experience.
Figure 14-4 Fitting results
The left subgraph of Figure 14-4 is the fitting result. The green dots are the test dataset, and the red dots are the inferenced results of the neural network. You can see that all the data fits well, except for the leftmost part. Note that we are not discussing over-fitting and under-fitting here. Our purpose in this chapter is to better fit a curve.
The subgraph on the right side of Figure 14-4 was generated with the following code:
y_test_real = net.inference(dr.XTest)
axes.scatter(y_test_real, y_test_real-dr.YTestRaw, marker='o')
The true value of the test set is taken as the abscissa, and the difference between the true value and the predicted value is taken as the ordinate. The ideal situation is that all points are arranged in a horizontal line at y=0. From the figure, the difference between the real value and the predicted value is obvious. Please note that the distance between the abscissa and the ordinate is on an order of magnitude, so the difference is actually not large.
Look at the last part of the output:
epoch=4999, total_iteration=449999
loss_train=0.000920, accuracy_train=0.968329
loss_valid=0.000839, accuracy_valid=0.962375
time used: 28.002626419067383
save parameters
total weights abs sum= 45.27530164993504
total weights = 8
little weights = 0
zero weights = 0
testing...
0.9817814550687021
0.9817814550687021
Since we set eps=0.001
, at about 5000 epoch
the requirement was met, so the training stopped. Finally, the accuracy rate obtained with the test set is 98.17%, which is already very good. If you trained for more rounds, you could get better results.
ch14, Level1