NetLogo User Community Models
Learning Complicated Games -Galla&Farmer (2012)
by Russell C. Thomas (Submitted: 01/08/2016)
## WHAT IS IT?
Replicates the model in "Complex dynamics in learning complicated games" by Galla and Farmer (2012). (http://www.pnas.org/cgi/doi/10.1073/pnas.1109672110)
This model shows the dynamics of strategies when two or more agents try to play complicated games (large number of possible moves, bounded rationality about long-term payoffs, etc.). The payoff matrices of the games are generated randomly, based on parameters described below. This allows experiments over a large number of games with similar qualities, without making assumptions about the detailed structure of the game (as is typical in analytic Game Theory).
The purpose of the model is to explore the conditions that fit traditional Game Theory analysis (e.g. unique equilibrium) and those that do not (multiple equilibria, chaotic dynamics). And given the complexity of the game, the model allows us to explore how well this learning model works to enable players to evolve cooperative solutions, even when the payoff matrix might be unappealing.
## HOW IT WORKS
The grid shows the payoff matrix for the two focal agents, "A" and "B", with each cell in the matrix color coded according to the pair of payoffs. Payoffs can range from -3.0 to +3.0. The color code is:
* Red = positive for A, negative for B
When Gamma is set to -1.0 (perfect negative correlation), you will only see shades of red or green. When Gamma is set to 1.0 (perfect positive correlation) you will only see shades of yellow (to olive and dark brown). When Gamma is set to zero, you will see the full range of colors.
The black square on the grid highlights the current payoff, given the move choices of agents A and B. The line drawn links the current payoff to the payoff in the previous tick.
## HOW TO USE IT
Click "Reset" to initialize the parameters in their default settings.
Click "Setup" to load the parameters and to initialize the payoff matrices.
Click "Go" to run the model continuously. Unclick to stop the model
Click "Step" to single step the model.
Set "num-agents" (currently fixed at 2, but will be expanded to 100 in later versions.)
Set "num-possible-moves" (N), which is the number of possible moves for each agent, ranging from 10 to 100. (Galla & Farmer 2012 use 50)
"Gamma" is the parameter that controls the correlation between agent payoffs. -1.0 is perfect negative correlation (i.e. zero sum game). +1.0 is perfect positive correlation (i.e. positive sum game). 0.0 is uncorrelated.
"alpha" is the parameter that controls agent memory (i.e. influence of past probabilities) when updating their preference probabilities. 0 is unlimited memory (no decay in the influence of past probabilities), while 1.0 is no memory (100% decay, meaning no influence of past probabilities).
"beta" is the parameter that controls randomness associated with the move choice decision. 0 is fully random with equal probability for each move. Infinity (i.e. large positive number) is a deterministic choice for the move with the highest "attractiveness". Galla & Farmer 2012 fix this at 0.07.
"learning-mode" is either "experience weighted", which is the Galla & Farmer model, or "none", which is no updating.
## THINGS TO NOTICE
The primary results to notice are the bar graphs for "Pr(move)" for the two focal agents, A and B. Do they settle into a stable pattern? (which would indicate an equilibrium) Or do they oscillate regularly? (which would incidate limit cycles) Or are the oscillations chaotic? (seemingly random)
It is interesting to compare the effectiveness of learning on cumulative payoffs, compared to no learning. Can the agents, in effect, cooperate to make the most of the payoff options available? With shorter memories (i.e. higher values of alpha), agents can reach an equilibrium in probabilities where they prefer a broad range of moves, but fail to home in on the few (or one) move that might yield the best payoff for all agents (i.e. a bright yellow cell in the payoff matrix).
## THINGS TO TRY
Try varying Gamma and alpha to see whether the probabilities Pr(move) demonstrate equilibria, limit cycles, or chaos. (Galla & Farmer 2012, p 2 and 3 show various parameter settings for Gamma and alpha that yeild interesting results)
Try varying "num-possible-moves" (N) to see if the complexity of the game has any effect.
## EXTENDING THE MODEL
**Detect equilibrium, and then stop.** Currently the model runs forever (i.e. until it is stopped manually). It would be useful to monitor changes in each players move probabilities and then stop the simulation when those probabilities stopped changing.
**Auto-detect "interesting" Pr(move) indices for plotting.** Currently, the experimentor manually sets the index values for plotting Pr(move) in the time series. Ideally, you'd like them to be "interesting" (i.e. large value and/or varying significantly with time). It would be good to detect this automatically after some number of ticks (~1,000).
**Support more than two agents.** Galla & Farmer (2012) include multiple players in their setup, but then reduce to only two players in their model and results. They don't describe how to implement multiple players, especially in light of the Gamma parameter which is the correlation in payoffs between any two players. With three or more players, it is not possible to have all the pair-wise correlations to be negative. One approach would be to have C(M,2) 2-player games, where C(M,2) is the number combinations of 2-player games with M players. (The formula for combinations is C(n,r) = n! / ( r! (n - r)! ), where n is the number of entities and r is the number in the subset.) For various numbers of players:
* C(2,2) = 1
As you can see, the number of combinations grows rapidly with the total number of players. Since each payoff matrix is N X N, having a large number of players becomes infeasable on ordinary personal computers due to memory limitations. One viable way of simplifying would be to not have different payoff matrices for each and every pair-wise game. You could even have just two -- one for a positively correlated game and one for the complementary negatively correlated game. Then each player plays both sides of each game, depending on the particular opponent.
**Probabilistic payoffs.** In some settings (e.g. cyber security) the payoff from moves is not deterministic, and instead is probabilistic. This could be modeled by using a random draw from a given probability distribution. It would be interesting to see how this would change the results if payoff distributions had long tails (i.e. enabling extreme payoffs on rare occasions).
**Uncertainty and/or imperfect knowledge regarding payoffs.** Along the same lines, it would be interesting to see if results would change if players had imperfect knowledge of their payoffs for a given move, or if their was uncertainty involved. It might be possible to incorporate this into the "experience weighted learning" model.
**Sequential and/or combinatorial game.** The current design is a one-shot iterative game. Many settings (e.g. investment sequencing) require sequential and/or combinatorial game structure.
## NETLOGO FEATURES
Uses Array and Matrix extensions.
## RELATED MODELS
The Galla & Farmer (2012) model is aligned with Evolutionary Game Theory. There are a number of NetLogo models that implement some version of Evolutionary Game Theory. Examples include:
* GameTheory, by Rick O'Gorman (Submitted: 07/22/2014)
However, none of these models is focused on complex games (i.e. large strategy space) and associatied learning mechanisms. The closest are probably those involving replicator dynamics (e.g. Russell 2007).
## CREDITS AND REFERENCES
Written by Russell C. Thomas (2016), based on paper and supplementary material:
Galla, T., & Farmer, J. D. (2013). Complex dynamics in learning complicated games. Proceedings of the National Academy of Sciences, 110(4), 1232–1236. http://doi.org/10.1073/pnas.1109672110
For discussion and analysis of this model, see this blog post:
(back to the NetLogo User Community Models)