NetLogo Models Library:
This is a model of a very small neural network. It is based on the Perceptron model, but instead of one layer, this network has two layers of "perceptrons". Furthermore, the layers activate each other in a nonlinear way. These two additions means it can learn operations a single layer cannot.
The goal of a network is to take input from its input nodes on the far left and classify those inputs appropriately in the output nodes on the far right. It does this by being given a lot of examples and attempting to classify them, and having a supervisor tell it if the classification was right or wrong. Based on this information the neural network updates its weight until it correctly classifies all inputs correctly.
Initially the weights on the links of the networks are random.
The nodes on the left are the called the input nodes, the nodes in the middle are called the hidden nodes, and the node on the right is called the output node.
The activation values of the input nodes are the inputs to the network. The activation values of the hidden nodes are equal to the activation values of inputs nodes, multiplied by their link weights, summed together, and passed through the sigmoid function. Similarly, the activation value of the output node is equal to the activation values of hidden nodes, multiplied by the link weights, summed together, and passed through the sigmoid function. The output of the network is 1 if the activation of the output node is greater than 0.5 and 0 if it is less than 0.5.
The sigmoid function maps negative values to values between 0 and 0.5, and maps positive values to values between 0.5 and 1. The values increase nonlinearly between 0 and 1 with a sharp transition at 0.5.
In order for the network to learn anything, it needs to be trained. In this example, the training algorithm used is called the backpropagation algorithm. It consists of two phases: propagate and backpropagate. The propagate phase was described above: it propagates the activation values of the input nodes to the output node of the network. In the backpropagate phase, the error in the produced value is passed back through the network layer by layer.
To do the backpropagation phase, the error is first calculated as a difference between the correct (expected) output and the actual output of the network. Since all of the hidden nodes connected to the output contribute to the error, all of the weights need to be updated. To do this, we need to calculate how much each of the nodes contributed to the overall error on the output. This is done by calculating a local gradient for each of the nodes, excluding the input nodes (since the input is the activation we provide to the network, and thus has no error associated with it).
The local gradients are calculated layer by layer. For the output nodes, it is the multiplication of the error with the result of passing the activation value to the derivative of the activation function. Since, in this model, the activation function is the sigmoid function, its simplified derivative ends up being:
activation_value * (1 - activation_value)
If we wished to use a different activation function, we would use the derivative of that function instead.
For each hidden node, the local gradient is calculated as follows:
For each output node connected to the hidden node, multiply its local gradient with the weight of the link connecting them;
Sum all the results from the previous step;
Multiply that sum with the result of passing the activation value of the hidden node to the derivative of the activation function.
To update the weights of each of the links, we multiply the learning rate with the local gradient of
end2 (this will be the output node in case the link connects a hidden node with the output node) and the activation value of
end1 (this will be the hidden node in case the link connects a hidden node with the output node). The result is then added to the old weight.
The propagate and backpropagate phases are repeated for each example shown to the network.
To use it press SETUP to create the network and initialize the weights to small random numbers.
Press TRAIN ONCE to run one epoch of training. The number of examples presented to the network during this epoch is controlled by EXAMPLES-PER-EPOCH slider.
Press TRAIN to continually train the network.
In the view, the larger the size of the link the greater the weight it has. If the link is red then it has a positive weight. If the link is blue then it has a negative weight.
If SHOW-WEIGHTS? is on then the links will be labeled with their weights.
To test the network, set INPUT-1 and INPUT-2, then press the TEST button. A dialog box will appear telling you whether or not the network was able to correctly classify the input that you gave it.
LEARNING-RATE controls how much the neural network will learn from any one example.
TARGET-FUNCTION allows you to choose which function the network is trying to solve.
Unlike the Perceptron model, this model is able to learn both OR and XOR. It is able to learn XOR because the hidden layer (the middle nodes) and the nonlinear activation allows the network to draw two lines classifying the input into positive and negative regions. A perceptron with a linear activation can only draw a single line. As a result one of the nodes will learn essentially the OR function that if either of the inputs is on it should be on, and the other node will learn an exclusion function that if both of the inputs or on it should be on (but weighted negatively).
However unlike the perceptron model, the neural network model takes longer to learn any of the functions, including the simple OR function. This is because it has a lot more that it needs to learn. The perceptron model had to learn three different weights (the input links, and the bias link). The neural network model has to learn ten weights (4 input to hidden layer weights, 2 hidden layer to output weight and the three bias weights).
Manipulate the LEARNING-RATE parameter. Can you speed up or slow down the training?
Switch back and forth between OR and XOR several times during a run. Why does it take less time for the network to return to 0 error the longer the network runs?
Add additional functions for the network to learn beside OR and XOR. This may require you to add additional hidden nodes to the network.
Backpropagation using gradient descent is considered somewhat unrealistic as a model of real neurons, because in the real neuronal system there is no way for the output node to pass its error back. Can you implement another weight-update rule that is more valid?
This model uses the link primitives. It also makes heavy use of lists.
This is the second in the series of models devoted to understanding artificial neural networks. The first model is Perceptron.
The code for this model is inspired by the pseudo-code which can be found in Tom M. Mitchell's "Machine Learning" (1997).
See also Haykin (2009) Neural Networks and Learning Machines, Third Edition.
Thanks to Craig Brozefsky for his work in improving this model and to Marin Aglić Čuvić for info tab improvements.
If you mention this model or the NetLogo software in a publication, we ask that you include the citations below.
For the model itself:
Please cite the NetLogo software as:
Copyright 2006 Uri Wilensky.
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
Commercial licenses are also available. To inquire about commercial licenses, please contact Uri Wilensky at firstname.lastname@example.org.