NetLogo banner

 Contact Us

 Modeling Commons

 User Manuals:


NetLogo User Community Models

(back to the NetLogo User Community Models)


by Aldo Martinez-Pinanez (Submitted: 11/26/2009)

[screen shot]

Download PentapalliOnlineVersion
If clicking does not initiate a download, try right clicking or control clicking and choosing "Save" or "Download".

(You can also run this model in your browser, but we don't recommend it; details here.)


This model tries to implement in NetLogo Pentapalli's comparative study of Roth-Erev and Modified Roth-Erev reinforcement learning algorithms for uniform-price double auctions. The main objective is to facilitate computational experiments in order to understand the behavior of learning algorithms in multi-agent contexts. The slides about this M.S. Thesis can be found on:

In order to understand this NetLogo model, I strongly encourage to first read the above mentioned file.

This is not the final version of it and is subject to revisions. One of the purpose of uploading this model is to share it with people that are interested in reinforcement learning.

If you detect any flaws in the model, have any constructive comments, please do not hesitate to contact me at


There are six sellers and six buyers where just one seller, an intelligent seller, uses reinforcemente learning. The intelligent seller is represented in the interface with a an orange fox face. The other sellers are represented with blue circles and the buyers with red squares. Each of the traders are labeled with their reservation value.

The intelligent seller has four sale price (ReservationValues 10, 40, 50 and 60) choices and all sellers/buyers have fixed reservation values ([10,20,30,40,50,60] [10,20,30,40,50]).

A market operator constructs the supply/demand curves and calculates the market clearing price. Thus, the seller Profit equals to the market clearing price minus the reservation value of the seller. According to these settings, depending on the ReservationValue of the intelligent seller just two clearing prices are possible: 30 and 35. The market operator is represented by the green pentagon. It is labeled with the actual clearing price.

Just one price choice for the learning seller generates positive profit. That is, when the ReservationValue is equal to 10. When this is the case, the clearing price equals to 30 and profits to 20 (30 - 10 = 20). In all other cases, the intelligent seller makes negative profits.

In every market iteration, the intelligent seller will choose an action by specifying a reservation value with help of the learning algorithm. The grey dots in the right of the interface represent the possible actions. They are labeled with the reservation value that they represented. Every action is linked to the intelligent seller. In this case, every link is an opinion of the seller to every action. Opinions have two variables, namely Propensity and Probability. The probability of every link represents the chance that this action will be chosen in the next iteration. The propensity of every action affects the probability depending on the experimentation parameter, the recency parameter, the number of actions and the profits.


Click the SETUP button to setup the world, then the GO button. The agents will begin to ask and bid as the market operator determines the clearing price. As well, the intelligent seller will start to select the actions according to the chosen learning algorithm. The sliders allow you to change each of the parameters of the model as described below. The plots provide an update on the state of the probabilities, propensities, how many times each action is chosen, and how the profits develop as the intelligent agent learns and swap among the actions.

The model includes the following parameters:

**ExperimentationParemeter** - The value of the experimentation parameter. It affects the propensity of every opinion to an action.

**RecencyParameter** - The value of the recency parameter. It affects the propensity of every opinion to an action.

**CoolingParameter** - It is the value of the cooling parameter that affects the probability of every opinion if the Variant Roth-Erev RL Algorithm is used.

**InitialPropensity** - It is the value of the propensities at time = 0.

**Algorithm** - It is a chooser that allows to choose between three algorithms: Roth-Erev RL Algorithm, Modified Roth-Erev RL Algorithm, Variant Roth-Erev Algorithm.

**Runs** - It allows to determine the number of runs you want make.

**Demand-Supply** - It plots the demand and supply curve and therefore points out the clearing price in every iteration. Notice how the supply curve changes over time as the intelligent seller choose an action.

**Offers Histogram** - it is an histogram that shows how many times every action of the intelligent seller is chosen during the simulation. The y-axis always equals the number of runs chosen. The x-axis equals the number of action. Action 1 with reservation value 10 is at the left of the histogram. Action 4 with reservation value 60 is at the right of the histogram. Action 2 and 3 are between them.

**Propensities** - It plots the value of each action-propensity during time.

**Probability** - It plots the value of each action-probability during time.

**Profits** - It plots the value of profits of the intelligent seller during every iteration


See how as the propensity of the Reservation Value 10 increases as the intelligent seller learns in the course of the simulation. As well the probability of choosing this action.


Change the Experimentation and recency parameters to examine the impact of them in the learning algorithm.

By using the Chooser Algorithm, experiment by using the 3 different learning algorithms.


Try to add another Intelligent Seller into the model. Maybe an intelligent Buyer?


Notice how the intelligent seller creates links (opinions with propensities and probabilities of being choosen) to every reservation value. That is, within this model, not just sellers, buyers, and the market operator are agents but as well the links (opinions) and the actions.


In order to choose an action with different probabilities in every interaction, the Lottery code example from the NetLogo Library was used it.


Mridul Pentapalli
Graduate Student (MS Comp. Sc.)
Iowa State University, Ames, IA
March 2008

A comparative study of Roth-Erev and Modified Roth-Erev
reinforcement learning algorithms for uniform-price
double auctions

To refer to this model in academic publications, please use: Martinez-Pinanez, A. (2009). NetLogo Roth-Erev model.


(back to the NetLogo User Community Models)