NetLogo User Community Models
Reinforcement Learning Wargame
by Joe Roop (Submitted: 05/08/2006)
WHAT IS IT?
This model implements Q-learning (Watkins 1989) a one-step temporal difference algorithm in the area of reinforcement learning.
HOW IT WORKS
The agent (strike aircraft, blue) has the ability to sense the state of the game in the form of health, distances, and number of weapons. After sensing the state and receiving a reward the agent can choose from 8 different actions to manipulate the state space such as evading left or right, flying towards a SAM, and firing a weapon towards the SAM. The following Q-Learning algorithm is used:
Q(s,a) = Q(s,a) + step-size * [reward + discount * max(Q(s’,a’)) – Q(s,a)]
The agent keeps makes moves until it runs out of weapons, dies, or kills the ‘target’ SAM site. The rewards are -2pts for weapons use, -200pts for dying, and +1000pts for killing the ‘target’ SAM. The agent also has the option of turning on the stealth technology, which allows the agent the ability to not be seen by the SAM sites.
HOW TO USE IT
The buttons and sliders control the setup and all the parameters inside the algorithm. The graph provides the average reward on obtained per episode. The step-size parameter is the amount old values are updated towards new values. Discount is the present value worth of future rewards. Exploration-% is the amount moves the agent takes towards a non-optimum patch, which can help the agent explore more tactics and not get stuck in local optimums.
THINGS TO NOTICE
The average reward in the graph increases over the number of episodes that the agent has trained on, which shows the learning process of the agent. With the stealth technology enabled does the agent perform different tactics?
THINGS TO TRY
Experiment with the algorithm parameters such as step-size, discount, and exploration-%. Also, investigate the environmental parameters.
EXTENDING THE MODEL
Implement different reward schemes allowing more direct and optimal paths, such as -1pts for every move the agent makes forcing the agent to find a more direct approach to the ‘target’ SAM. Add a more robust exploration routine. The model is set up for multi-agent learning however, more advanced cooperation vs self-interest algorithms need to be implemented to help solve the unstable environment that multi-agent learning can cause.
This model requires an outside file (“agent.rtf”) in order to store the learned tactics. If an error is seen for “LOAD-STATE-ACTION-FILE” click the “Clear/Create File” button and the “agent.rtf” file will be created and the file will work as long as there is permission to write in the directory where the model is stored.
CREDITS AND REFERENCES
(back to the NetLogo User Community Models)